Massachusetts Institute of Technology

Size: px
Start display at page:

Download "Massachusetts Institute of Technology"

Transcription

1 Massachusetts Istitute of Techology Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the matrix of traiig examples X ad the correspodig respose variables y is: θ = (X T X) X T y The quatity (X T X) X T is also kow as the pseudo-iverse of X, ad ofte occurs whe dealig with liear systems of equatios. Whe X is a square matrix ad ivertible, it is exactly the same as the iverse of X. MATLAB provides may ways of achievig our desired goal. We ca directly write out the expressio above or use the fuctio piv. Here are the fuctios liear regress ad liear pred: fuctio theta = liear_regress(y,x) theta = piv(x)*y; fuctio y_hat = liear_pred(theta,x) y_hat = X*theta; I liear regress, ote that we are ot calculatig θ separately. This differs from the descriptio i the lecture otes where the traiig examples were explicitly padded with s, allowig us to itroduce a offset θ. Istead, we will use a feature mappig to achieve the same effect. (b) ( poits) Before we describe the solutio, we first describe how the dataset was created. This may help you appreciate why some feature mappigs may work better tha others. The x values were created by samplig each coordiate uiformly at radom from (-,): X i = rad(,3)* - Give a particular x, the correspodig y true ad y oisy values were created as follows: y true = log x 5 log x 7.5 log x + 3 y oisy = y true + ɛ ɛ N(, ) To evaluate the performace of liear regressio o give traiig ad test sets, we created the fuctio test liear reg which combies the regressio, predictio, ad evaluatio steps. You may, of course, do it i some other way: fuctio err = test_liear_reg(y_trai, X_trai, y_test, X_test) % trai liear regressio usig X_trai ad y_trai % evaluate the mea squared predictio error o X_test ad y_test theta = liear_regress(y_trai, X_trai); y_hat = liear_pred(theta,x_test); yd = y_hat - y_test; err = sum(yd.^)/legth(yd); %err = Mea Squared Predictio Error Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

2 Usig this, we ca ow calculate the mea squared predictio error (MSPE) for the two feature mappigs: >> X = feature_mappig(x_i,); >> test_liear_reg(y_oisy, X, y_true, X) as =.5736e+3 >> X = feature_mappig(x_i,); >> test_liear_reg(y_oisy, X, y_true, X) as =.56 The two sets of errors are (φ ) ad.56 (φ ), respectively. Usurprisigly, the mappig φ performs much better tha φ it is exactly the space i which the relatioship betwee X ad y is liear. (c) ( poits) Recall from the otes (Eq 8, lecture 6), that the desired quatity we eed to maximize is v T AAv + v T Av where A = (X T X). I the otes, a offset parameter is explicitly assumed so that v = [x T, ] T. However, i our case this is ot ecessary ad so v = x. Active learig may, i geeral, select the same poit x to be observed repeatedly. Each of these observatios correspods to a differet y oisy. However, due to practical limitatios, we had supplied you with oly oe set of y oisy values. Thus, if some x i occurs repeatedly i idx, you will eed to use the same y oisy, i for each occurrece of x i. Alteratively, you could chage your code so as to disallow reptitios i idx. This is the optio we have chose here. Give ay feature space, the criterio above will aim to fid poits far apart i that space. However, these poits may ot be far apart i the feature space where classificatio actually occurs. This is of particular cocer whe the latter feature space might ot be easily accessible (e.g., whe usig a kerel like the radial basis fuctio kerel). Here s the active learig code: fuctio idx = active_lear(x,k,k) idx = :k; N = size(x,); for i=:k var_reductio = zeros(n,); X = X(idx,:); A = iv(x *X); AA = A*A; for j=:n if ismember(j,idx) %this is the part where we disallow repititios cotiue; ed v = X(j,:); a = v*aa*v / ( + (v*a*v )); var_reductio(j) = a; ed [a, aidx] = max(var_reductio); idx(ed+) = aidx; ed Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

3 Usig it, we ca ow compute the desired quatities: >> idx = active_lear(x,5,) idx = >> test_liear_reg(y_oisy(idx), X(idx,:), y_true, X) as =.5734e+3 >> idx = active_lear(x,5,) idx = >> test_liear_reg(y_oisy(idx), X(idx,:), y_true, X) as = Thus, the MSPE whe usig φ for active learig chooses poits that are ot well-placed (for this particular dataset); φ performs much better. Note that the feature mappig used to perform the regressio (ad evaluatio) is the same for both (φ ) so the performace differece is due to the poits chose by active learig. The aswers chage whe repititios are allowed (MSPE for φ = 7.47 ; for φ = 5.5), but they still illustrate our cocept. For completeess sake, the aswers for the origial versio of the problem are: (φ : 68.9 (re peats), 68.6 (o repeats)) ad (φ : (o repeats) ad (repeats)). (d) (4 poits) This error will vary, depedig upo the umber of iteratios you perform ad the radom poits selected. I my simulatios, the value of error (usig mappig φ for regressio ad evaluatio) was Whe this simulatio was ru for a rus, the error was close to Clearly, it seems to be much better to perform radom samplig tha perform active learig i φ s space. This may seem surprisig at first, but is completely uderstadable: i the space where regressio is performed (φ ) the poits chose by performig active learig i the φ space are ot far apart at all, ad are thus particularly bad poits to be sampled. For completeess sake, the aswer for the origial versio of the problem is: (φ : 8.9, φ : ). (e) (4 poits) The figures are show i Fig. For clarity s sake, we have oly plotted the last poits of idx (sice the first 5 are the same across all cases). I the origial feature space (fig a), the poits selected by active learig i the φ space are spread far apart, as expected. However, as part (b) showed, a better fit to data is obtaied by usig φ. I this space (fig b), the poits selected by active learig i the φ space are very closely buched together. The poits selected by active learig i the φ space are, i cotrast, spread far apart. This helps explai why the poits leared usig active learig o the φ space did ot lead to good performace i the regressio step.. (a) 5 poits The fuctio f(t) = βt / mootoically decreases with t (for β > ). The fuctio g(t) = e t mootoically icreases with t. Thus, the fuctio g o f(t) = h(t) = e βt / mootoically decreases with t. As such, the RBF kerel K(x, x ) = e β x x defies a Gram matrix that satisfies the coditios of Michelli s theorem ad is hece ivertible. (b) 5 poits Let A = (λi + K) y. The, lim A = K y λ Sice K is always ivertible, this limit is always well-defied. Now, we have ˆα t = λa t where A t is the t-th elemet of A. We the have: Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

4 (a) I φ space (b) I φ space Figure : The red circles correspod to poits chose by performig active learig o the φ space ad the blue oes correspod to those chose by performig active learig o the φ space. Oly the last poits of idx are show for each; the first 5 poits are the same. We the have, y(x) = t= (ˆα t/λ)k(x t, x), or y(x) = t = (λa t/λ)k(x t, x), or y(x) = t= A tk(x t, x) lim λ y(x) = t=(lim λ A t )K(x t, x), or lim λ y(x) = t= B tk(x t, x), where B = K y Thus, i the required limit, the fuctio y(x) = i= B tk(x t, x) (c) poits To prove that the traiig error is zero, we eed to prove that y(x t ) = y t for t =,...,. From part (b), we have y(x t ) = i= B i K(x i, x t ), or y(x t ) = ( j= y j K (i, j))k(x i, x t ) or i= y(x t ) = j= y j i= K (i, j)k(x i, x t ), or y(x t ) = j= y j δ(j, t), or y(x t ) = y t where K (i, j) = (i, j)-th etry of K ad { for i = j δ(i, j) = for i = j Here, we made use of the fact that K(x i, x t ) = K(x t, x i ) ad that if A = B the i A(t, i)b(i, j) = δ(t, j). (d) 5 poits Sample code for this problem is show below: Ntrai = size(xtrai,); Ntest = size(xtest,); Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

5 for i=:legth(lambda), lmb = lambda(i); alpha = lmb * ((lmb*eye(ntrai) + K)^-) * Ytrai; Atrai = (/lmb) * repmat( alpha, Ntrai, ); yhat_trai = sum(atrai.*k,); Atest = (/lmb) * repmat( alpha, Ntest, ); yhat_test = sum(atest.*(ktrai_test ), ); E(i,:) = [mea((yhat_trai-ytrai).^),mea((yhat_test-ytest).^)]; ed; Trai Error Test Error β Figure : Traiig ad test error for Prob # (e) The resultig plot is show i Fig. As ca be see the traiig error is zero at λ = ad icreases as λ icreases. The test error iitially decreases, reaches a miimum aroud., ad the icreases agai. This is exactly as we would expect. λ results i over-fittig (the model is too powerful). Our regressio fuctio has a low bias but high variace. By icreasig λ we costrai the model, thus icreasig the traiig error. While the regularizatio icreases bias, the variace decreases faster, ad we geeralize better. High values of λ result i uder-fittig (high bias, low variace) ad both traiig error ad test errors are high. 3. (a) Observe that θ is oly a sum of y t φ(x t ) s, so we ca just store the coefficiets: θ = w t y t φ(x t ). t= We ca update w t s by icremetig w t by oe whe a mistake is made o example t. Classifyig ew examples meas evaluatig: which oly ivolves kerel operatios. θ T φ(x) = w t y t K(X t, X), t= Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

6 (b) The most straightforward proof: use the regressio argumet to show that you ca fit the poits exactly ad ot oly achieve the correct sig, but the value of the discrimiat fuctio ca be made ± for every traiig example. (c) Here is my solutio. % kerel= ( + traspose(x i )x j ) d ; d = 5; % polyomial kerel % kerel= exp( traspose(x i x j )(x i x j )/(s )) ; s = 3; % radial basis fuctio kerel fuctio α=trai kerel perceptro(x, y, kerel), d = size(x); K = []; for i = : x i = X(i, :) ; for j = : x j = X(j, :) ; K(i, j) = eval(kerel); ed ed α = zeros(, ); mistakes = ; while mistakes > mistakes = ; for i = : if α K(:, i)y(i) α(i) = α(i) + y(i); mistakes = mistakes + ; ed ed ed fuctio f =discrimiat fuctio(α, X, kerel, X test ), d = size(x); K = []; for i = : x i = X(i, :) ; x j = X test ; K(i) = eval(kerel); ed f = α K; (d) The origial dataset requires d = 4; the ew dataset requires d =. A RBF will easily separate either dataset. load p3 a X ad y loaded. kerel = ( + traspose(x i ) x j ) ; α = trai kerel perceptro(x, y, kerel); figure hold o plot(x( :, ), X( :, ), rs ) plot(x( :, ), X( :, ), bo ) plot dec boudary(α, X, kerel, [ 4, ], [.5,.5], [, 4]) Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

7 kerel = exp( traspose(x i x j ) (x i x j )/8) ; α = trai kerel perceptro(x, y, kerel); figure hold o plot(x( :, ), X( :, ), rs ) plot(x( :, ), X( :, ), bo ) plot dec boudary(α, X, kerel, [ 4, ], [.5,.5], [, 4]) 3 3 Cite as: Tommi Jaakkola, course materials for Machie Learig, Fall 6. MIT OpeCourseWare

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression

Topics Machine learning: lecture 3. Linear regression. Linear regression. Linear regression. Linear regression 6.867 Machie learig: lecture 3 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics Beod liear regressio models additive regressio models, eamples geeralizatio ad cross-validatio populatio miimizer Statistical

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Signals & Systems Chapter3

Signals & Systems Chapter3 Sigals & Systems Chapter3 1.2 Discrete-Time (D-T) Sigals Electroic systems do most of the processig of a sigal usig a computer. A computer ca t directly process a C-T sigal but istead eeds a stream of

More information

Linear Programming and the Simplex Method

Linear Programming and the Simplex Method Liear Programmig ad the Simplex ethod Abstract This article is a itroductio to Liear Programmig ad usig Simplex method for solvig LP problems i primal form. What is Liear Programmig? Liear Programmig is

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

(all terms are scalars).the minimization is clearer in sum notation:

(all terms are scalars).the minimization is clearer in sum notation: 7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions Faculty of Egieerig MCT242: Electroic Istrumetatio Lecture 2: Istrumetatio Defiitios Overview Measuremet Error Accuracy Precisio ad Mea Resolutio Mea Variace ad Stadard deviatio Fiesse Sesitivity Rage

More information

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology.

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology. Quadratic Fuctios I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively i mathematical

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors 5 Eigevalues ad Eigevectors 5.3 DIAGONALIZATION DIAGONALIZATION Example 1: Let. Fid a formula for A k, give that P 1 1 = 1 2 ad, where Solutio: The stadard formula for the iverse of a 2 2 matrix yields

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Nonlinear regression

Nonlinear regression oliear regressio How to aalyse data? How to aalyse data? Plot! How to aalyse data? Plot! Huma brai is oe the most powerfull computatioall tools Works differetly tha a computer What if data have o liear

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Zeros of Polynomials

Zeros of Polynomials Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

2 Geometric interpretation of complex numbers

2 Geometric interpretation of complex numbers 2 Geometric iterpretatio of complex umbers 2.1 Defiitio I will start fially with a precise defiitio, assumig that such mathematical object as vector space R 2 is well familiar to the studets. Recall that

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Frequency Domain Filtering

Frequency Domain Filtering Frequecy Domai Filterig Raga Rodrigo October 19, 2010 Outlie Cotets 1 Itroductio 1 2 Fourier Represetatio of Fiite-Duratio Sequeces: The Discrete Fourier Trasform 1 3 The 2-D Discrete Fourier Trasform

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

CS276A Practice Problem Set 1 Solutions

CS276A Practice Problem Set 1 Solutions CS76A Practice Problem Set Solutios Problem. (i) (ii) 8 (iii) 6 Compute the gamma-codes for the followig itegers: (i) (ii) 8 (iii) 6 Problem. For this problem, we will be dealig with a collectio of millio

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Time-Domain Representations of LTI Systems

Time-Domain Representations of LTI Systems 2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

Chapter 9: Numerical Differentiation

Chapter 9: Numerical Differentiation 178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

15.083J/6.859J Integer Optimization. Lecture 3: Methods to enhance formulations

15.083J/6.859J Integer Optimization. Lecture 3: Methods to enhance formulations 15.083J/6.859J Iteger Optimizatio Lecture 3: Methods to ehace formulatios 1 Outlie Polyhedral review Slide 1 Methods to geerate valid iequalities Methods to geerate facet defiig iequalities Polyhedral

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution Departmet of Civil Egieerig-I.I.T. Delhi CEL 899: Evirometal Risk Assessmet HW5 Solutio Note: Assume missig data (if ay) ad metio the same. Q. Suppose X has a ormal distributio defied as N (mea=5, variace=

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE Geeral e Image Coder Structure Motio Video (s 1,s 2,t) or (s 1,s 2 ) Natural Image Samplig A form of data compressio; usually lossless, but ca be lossy Redudacy Removal Lossless compressio: predictive

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A. Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information