10-701/ Machine Learning Mid-term Exam Solution

Size: px
Start display at page:

Download "10-701/ Machine Learning Mid-term Exam Solution"

Transcription

1 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID:

2 True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it holds that 0 p(x) for all x. 2. (F) Decisio tree is leared by miimizig iformatio gai. 3. (F) Liear regressio estimator has the smallest variace amog all ubiased estimators. 4. (T) The coefficiets α assiged to the classifiers assembled by AdaBoost are always o-egative. 5. (F) Maximizig the likelihood of logistic regressio model yields multiple local optimums. 6. (F) No classifier ca do better tha a aive Bayes classifier if the distributio of the data is kow. 7. (F) The back-propagatio algorithm lears a globally optimal eural etwork with hidde layers. 8. (F) The VC dimesio of a lie should be at most 2, sice I ca fid at least oe case of 3 poits that caot be shattered by ay lie. 9. (F) Sice the VC dimesio for a SVM with a Radial Base Kerel is ifiite, such a SVM must be worse tha a SVM with polyomial kerel which has a fiite VC dimesio. 0. (F) A two layer eural etwork with liear activatio fuctios is essetially a weighted combiatio of liear separators, traied o a give dataset; the boostig algorithm built o liear separators also fids a combiatio of liear separators, therefore these two algorithms will give the same result. 2

3 2 Liear Regressio (0%) We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples (x ; y ),..., (x ; y ) where x i ad y i are real umbers for all i. Let w = [w 0, w ]T be the least squares solutio we are after. I other words, w miimizes J(w) = (y i w 0 w x i ) 2. You ca assume for our purposes here that the solutio is uique. i=. (5%) Check each statemet that must be true if w = [w 0, w ]T is ideed the least squares solutio. i= (y i w 0 w x i)y i = 0 ( ) i= (y i w 0 w x i)(y i ȳ) = 0 ( ) i= (y i w 0 w x i)(x i x) = 0 ( ) i= (y i w 0 w x i)(w 0 + w x i) = 0 ( ) where x ad ȳ are the sample meas based o the same dataset. (hit: take the derivative of J(w) with respect to w 0 ad w ) (sol.) Takig the derivative with respect to w ad w 0 gives us the followig coditios of optimality w 0 J(w) = 2 w J(w) = 2 (y i w 0 w x i ) = 0 i= (y i w 0 w x i )x i = 0 i= This meas that the predictio error (y i w 0 w x i ) does ot co-vary with ay liear fuctio of the iputs (has a zero mea ad does ot co-vary with the iputs). (x i x) ad (w 0 + w x i) are both liear fuctios of iputs. 2. (5%) There are several umbers (statistics) computed from the data that we ca use to estimate w. There are x = i= x i ( ) ȳ = i= y i ( ) C xx = i= (x i x) 2 ( ) C xy = i= (x i x)(y i ȳ) ( ) C yy = i= (y i ȳ) 2 ( ) Suppose we oly care about the value of w. We d like to determie w o the basis of ONLY two umbers (statistics) listed above. Which two umbers do we eed for this? (hit: use the aswers to the previous questio) 3

4 (sol.) We eed C xx (spread of x) ad C xy (liear depedece betwee x ad y). No justificatio was ecessary as these basic poits have appeared i the course. If we wat to derive these more mathematically, we ca, for example, look at oe of the aswers to the previous questio: (y i w0 wx i )(x i x) = 0, i= which we ca rewritte as [ ] y i (x i x) w0 i= [ ] (x i x) w i= By usig the fact that / i (x i x) = 0 we see that [ ] x i (x i x) = 0 i= y i (x i x) = i= x i (x i x) = i= (y i ȳ)(x i x) = C xy i= (x i x)(x i x) = C xx i= Substitutig these back ito our equatio above gives C xy w C xx = 0 4

5 3 AdaBoost (5%) Cosider buildig a esemble of decisio stumps G m with the AdaBoost algorithm, ( M ) f(x) = sig α m G m (x). m= Figure dispalys a few labeled poit i two dimesios as well as the first stump we have chose. A stump predicts biary ± values, ad depeds oly o oe coordiate value (the split poit). The little arrow i the figure is the ormal to the stump decisio boudary idicatig the positive side where the stump predicts +. All the poits start with uiform weights. x x Figure : Labeled poits ad the first decisio stump. The arrow poits i the positive directio from the stump decisio boudary.. (5%) Circle all the poit(s) i Figure whose weight will icrease as a result of icorporatig the first stump (the weight update due to the first stump). (sol.) The oly misclassified egative sample. 2. (5%) Draw i the same figure a possible stump that we could select at the ext boostig iteratio. You eed to draw both the decisio boudary ad its positive orietatio. (sol.) The secod stump will also be a vertical split betwee the secod positive sample (from left to right) ad the misclassified egative smaple, as draw i the figure. 3. (5%) Will the secod stump receive higher coefficiet i the esemble tha the first? I other words, will α 2 > α? Briefly explai your aswer. (o calculatio should be ecessary). (sol.) α 2 > α because the poit that the secod stump misclassifies will have a smaller relative weight sice it is classified correctly by the first stump. 5

6 4 Neural Nets (5%) Cosider a eural et for a biary classificatio which has oe hidde layer as show i the figure. We use a liear activatio fuctio h(z) = cz at hidde uits ad a sigmoid activatio fuctio g(z) = +e z at the output uit to lear the fuctio for P (y = x, w) where x = (x, x 2 ) ad w = (w, w 2,..., w 9 ). x w 3 w 4 bias bias w w2 w 7 w 8 x 2 w 5 w 6 w 9. (5%) What is the output P (y = x, w) from the above eural et? Express it i terms of x i, c ad weights w i. What is the fial classificatio boudary? (sol.) g(w 7 + w 8 h(w + w 3 x + w 5 x 2 ) + w 9 h(w 2 + w 4 x + w 6 x 2 )) = + exp( (w 7 + cw 8 w + cw 9 w 2 + (cw 8 w 3 + cw 9 w 4 )x + (cw 8 w 5 + cw 9 w 6 )x 2 )) The classificatio boudary is : w 7 + cw 8 w + cw 9 w 2 + (cw 8 w 3 + cw 9 w 4 )x + (cw 8 w 5 + cw 9 w 6 )x 2 = 0 2. (5%) Draw a eural et with o hidde layer which is equivalet to the give eural et, ad write weights w of this ew eural et i terms of c ad w i. (sol.) 3. (5%) Is it true that ay multi-layered eural et with liear activatio fuctios at hidde layers ca be represeted as a eural et without ay hidde layer? Briefly explai your aswer. (sol.) Yes. If liear activatio fuctios are used for all the hidde uits, output from hidde uits will be writte as liear combiatio of iput features. Sice these itermediate output serves as iput for the fial output layer, we ca always fid a equivalet eural et which does ot have ay hidde layer as see i the example above. 6

7 7

8 5 Kerel Method (20%) Suppose we have six traiig poits from two classes as i Figure (a). Note that we have four poits from class : (0.2, 0.4), (0.4, 0.8), (0.4, 0.2), (0.8, 0.4) ad two poits from class 2: (0.4, 0.4), (0.8, 0.8). Ufortuately, the poits i Figure (a) caot be separated by a liear classifier. The kerel trick is to fid a mappig of x to some feature vector φ(x) such that there is a fuctio K called kerel which satisfies K(x, x ) = φ(x) T φ(x ). Ad we expect the poits of φ(x) to be liearly separable i the feature space. Here, we cosider the followig ormalized kerel: K(x, x ) = xt x x T x. (5%) What is the feature vector φ(x) correspodig to this kerel? Draw φ(x) for each traiig poit x i Figure (b), ad specify from which poit it is mapped. φ(x) = 2. (5%) You ow see that the feature vectors are liearly separable i the feature space. The maximum-margi decisio boudary i the feature space will be a lie i R 2, which ca be writte as w x + w 2 y + c = 0. What are the values of the coefficiets w ad w 2? (Hit: you do t eed to compute them.) x x (sol.) (w, w 2 ) = (, ) 3. (3%) Circle the poits correspodig to support vectors i Figure (b). 4. (7%) Draw the decisio boudary i the origial iput space resultig from the ormalized liear kerel i Figure (a). Briefly explai your aswer. 8

9 6 VC Dimesio ad PAC Learig(0%) The VC dimesio, V C(H), of hypothesis space H defied over istace space X is the size of the largest largest umber of poits (i some cofiguratio) that ca be shattered by H. Suppose with probability ( δ), a PAC learer outputs a hypothesis withi error ɛ of the best possible hypothesis i H. It ca be show that the lower boud o the umber of traiig examples m sufficiet for successful learig, stated i terms of V C(H) is m ɛ (4 log 2(2/δ) + 8V C(H) log 2 (3/ɛ)). Cosider a learig problem i which X = R is the set of real umbers, ad the hypothesis space is the set of itervals H = {(a < x < b) a, b R}. Note that the hypothesis labels poits iside the iterval as positive, ad egative otherwise.. (5%) What is the VC dimesio of H? (sol.) V C(H) = 2. Suppose we have two poits x ad x 2, ad x < x 2. They ca always be shattered by H, o matter how they are labled. (a) if x positive ad x 2 egative, choose a < x < b < x 2 ; (b) if x egative ad x 2 positive, choose x < a < x 2 < b; (c) if both x ad x 2 positive, choose a < x < x 2 < b; (d) if both x ad x 2 egative, choose a < b < x < x 2 ; However, if we have three poits x < x 2 < x 3 ad if they are labeled as x (positive) x 2 (egative) ad x 3 (positive), the they caot be shattered by H. 2. (5%) What is the probability that a hypothesis cosistet with m examples will have error at least ɛ? (sol.) Use the above result. Substitute V C(H) = 2 ito the iequality m ɛ (4 log 2(2/δ) log 2 (3/ɛ)), we have ɛm 4 log 2 (2/δ) log 2 (3/ɛ) ɛm 6 log 2 (3/ɛ) 4 log 2 (2/δ) 2 ɛm/4 (3/ɛ) 4 2/δ δ ( 3 ) 4 ɛ 2 ɛm/4 9

10 7 Logistic Regressio (0%) We cosider the followig models of logistic regressio for a biary classificatio with a sigmoid fuctio g(z) = +e z : Model : P (Y = X, w, w 2 ) = g(w X + w 2 X 2 ) Model 2: P (Y = X, w, w 2 ) = g(w 0 + w X + w 2 X 2 ) We have three traiig examples: x () = [, ] T x (2) = [, 0] T x (3) = [0, 0] T y () = y (2) = y (3) =. (5%) Does it matter how the third example is labeled i Model? i.e., would the leared value of w = (w, w 2 ) be differet if we chage the label of the third example to -? Does it matter i Model 2? Briefly explai your aswer. (Hit: thik of the decisio boudary o 2D plae.) (sol.) It does ot matter i Model because x (3) = (0, 0) makes w x + w 2 x 2 always zero ad hece the likelihood of the model does ot deped o the value of w. But it does matter i Model (5%) Now, suppose we trai the logistic regressio model (Model 2) based o the traiig examples x (),..., x () ad labels y (),..., y () by maximizig the pealized log-likelihood of the labels: log P (y (i) x (i), w) λ 2 w 2 = log g(y (i) w T x (i) ) λ 2 w 2 i i For large λ (strog regularizatio), the log-likelihood terms will behave as liear fuctios of w. log g(y (i) w T x (i) )) 2 y(i) w T x (i) Express the pealized log-likelihood usig this approximatio (with Model ), ad derive the expressio for MLE ŵ i terms of λ ad traiig data {x (i), y (i) }. Based o this, explai how w behaves as λ icreases. (We assume each x (i) = (x (i), x(i) 2 )T ad y (i) is either or - ) (sol.) log l(w) 2 y(i) w T x (i) λ 2 w 2 i log l(w) y (i) x (i) w 2 λw = 0 i log l(w) y (i) x (i) 2 w 2 2 λw 2 = 0 i w = 2λ y (i) x (i) i 0

11

12 2

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Math 220B Final Exam Solutions March 18, 2002

Math 220B Final Exam Solutions March 18, 2002 Math 0B Fial Exam Solutios March 18, 00 1. (1 poits) (a) (6 poits) Fid the Gree s fuctio for the tilted half-plae {(x 1, x ) R : x 1 + x > 0}. For x (x 1, x ), y (y 1, y ), express your Gree s fuctio G(x,

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004. MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departmet of Electrical Egieerig ad Computer Sciece 6.34 Discrete Time Sigal Processig Fall 24 BACKGROUND EXAM September 3, 24. Full Name: Note: This exam is closed

More information

Question 1: The magnetic case

Question 1: The magnetic case September 6, 018 Corell Uiversity, Departmet of Physics PHYS 337, Advace E&M, HW # 4, due: 9/19/018, 11:15 AM Questio 1: The magetic case I class, we skipped over some details, so here you are asked to

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

CHAPTER 5. Theory and Solution Using Matrix Techniques

CHAPTER 5. Theory and Solution Using Matrix Techniques A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

6.003 Homework #3 Solutions

6.003 Homework #3 Solutions 6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the

More information

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n. 0_0905.qxd //0 :7 PM Page SECTION 9.5 Alteratig Series Sectio 9.5 Alteratig Series Use the Alteratig Series Test to determie whether a ifiite series coverges. Use the Alteratig Series Remaider to approximate

More information

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck! Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Statistical Theory MT 2009 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where

More information

Chapter 2 The Solution of Numerical Algebraic and Transcendental Equations

Chapter 2 The Solution of Numerical Algebraic and Transcendental Equations Chapter The Solutio of Numerical Algebraic ad Trascedetal Equatios Itroductio I this chapter we shall discuss some umerical methods for solvig algebraic ad trascedetal equatios. The equatio f( is said

More information

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j. Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

MATHEMATICAL SCIENCES PAPER-II

MATHEMATICAL SCIENCES PAPER-II MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2008 Problems 1: Solution sketches Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

PAPER : IIT-JAM 2010

PAPER : IIT-JAM 2010 MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Carleton College, Winter 2017 Math 121, Practice Final Prof. Jones. Note: the exam will have a section of true-false questions, like the one below.

Carleton College, Winter 2017 Math 121, Practice Final Prof. Jones. Note: the exam will have a section of true-false questions, like the one below. Carleto College, Witer 207 Math 2, Practice Fial Prof. Joes Note: the exam will have a sectio of true-false questios, like the oe below.. True or False. Briefly explai your aswer. A icorrectly justified

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t = Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Complex Numbers Solutions

Complex Numbers Solutions Complex Numbers Solutios Joseph Zoller February 7, 06 Solutios. (009 AIME I Problem ) There is a complex umber with imagiary part 64 ad a positive iteger such that Fid. [Solutio: 697] 4i + + 4i. 4i 4i

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Chapter 9: Numerical Differentiation

Chapter 9: Numerical Differentiation 178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning

CS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information