10-701/ Machine Learning Mid-term Exam Solution
|
|
- Samuel Harper
- 5 years ago
- Views:
Transcription
1 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID:
2 True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it holds that 0 p(x) for all x. 2. (F) Decisio tree is leared by miimizig iformatio gai. 3. (F) Liear regressio estimator has the smallest variace amog all ubiased estimators. 4. (T) The coefficiets α assiged to the classifiers assembled by AdaBoost are always o-egative. 5. (F) Maximizig the likelihood of logistic regressio model yields multiple local optimums. 6. (F) No classifier ca do better tha a aive Bayes classifier if the distributio of the data is kow. 7. (F) The back-propagatio algorithm lears a globally optimal eural etwork with hidde layers. 8. (F) The VC dimesio of a lie should be at most 2, sice I ca fid at least oe case of 3 poits that caot be shattered by ay lie. 9. (F) Sice the VC dimesio for a SVM with a Radial Base Kerel is ifiite, such a SVM must be worse tha a SVM with polyomial kerel which has a fiite VC dimesio. 0. (F) A two layer eural etwork with liear activatio fuctios is essetially a weighted combiatio of liear separators, traied o a give dataset; the boostig algorithm built o liear separators also fids a combiatio of liear separators, therefore these two algorithms will give the same result. 2
3 2 Liear Regressio (0%) We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples (x ; y ),..., (x ; y ) where x i ad y i are real umbers for all i. Let w = [w 0, w ]T be the least squares solutio we are after. I other words, w miimizes J(w) = (y i w 0 w x i ) 2. You ca assume for our purposes here that the solutio is uique. i=. (5%) Check each statemet that must be true if w = [w 0, w ]T is ideed the least squares solutio. i= (y i w 0 w x i)y i = 0 ( ) i= (y i w 0 w x i)(y i ȳ) = 0 ( ) i= (y i w 0 w x i)(x i x) = 0 ( ) i= (y i w 0 w x i)(w 0 + w x i) = 0 ( ) where x ad ȳ are the sample meas based o the same dataset. (hit: take the derivative of J(w) with respect to w 0 ad w ) (sol.) Takig the derivative with respect to w ad w 0 gives us the followig coditios of optimality w 0 J(w) = 2 w J(w) = 2 (y i w 0 w x i ) = 0 i= (y i w 0 w x i )x i = 0 i= This meas that the predictio error (y i w 0 w x i ) does ot co-vary with ay liear fuctio of the iputs (has a zero mea ad does ot co-vary with the iputs). (x i x) ad (w 0 + w x i) are both liear fuctios of iputs. 2. (5%) There are several umbers (statistics) computed from the data that we ca use to estimate w. There are x = i= x i ( ) ȳ = i= y i ( ) C xx = i= (x i x) 2 ( ) C xy = i= (x i x)(y i ȳ) ( ) C yy = i= (y i ȳ) 2 ( ) Suppose we oly care about the value of w. We d like to determie w o the basis of ONLY two umbers (statistics) listed above. Which two umbers do we eed for this? (hit: use the aswers to the previous questio) 3
4 (sol.) We eed C xx (spread of x) ad C xy (liear depedece betwee x ad y). No justificatio was ecessary as these basic poits have appeared i the course. If we wat to derive these more mathematically, we ca, for example, look at oe of the aswers to the previous questio: (y i w0 wx i )(x i x) = 0, i= which we ca rewritte as [ ] y i (x i x) w0 i= [ ] (x i x) w i= By usig the fact that / i (x i x) = 0 we see that [ ] x i (x i x) = 0 i= y i (x i x) = i= x i (x i x) = i= (y i ȳ)(x i x) = C xy i= (x i x)(x i x) = C xx i= Substitutig these back ito our equatio above gives C xy w C xx = 0 4
5 3 AdaBoost (5%) Cosider buildig a esemble of decisio stumps G m with the AdaBoost algorithm, ( M ) f(x) = sig α m G m (x). m= Figure dispalys a few labeled poit i two dimesios as well as the first stump we have chose. A stump predicts biary ± values, ad depeds oly o oe coordiate value (the split poit). The little arrow i the figure is the ormal to the stump decisio boudary idicatig the positive side where the stump predicts +. All the poits start with uiform weights. x x Figure : Labeled poits ad the first decisio stump. The arrow poits i the positive directio from the stump decisio boudary.. (5%) Circle all the poit(s) i Figure whose weight will icrease as a result of icorporatig the first stump (the weight update due to the first stump). (sol.) The oly misclassified egative sample. 2. (5%) Draw i the same figure a possible stump that we could select at the ext boostig iteratio. You eed to draw both the decisio boudary ad its positive orietatio. (sol.) The secod stump will also be a vertical split betwee the secod positive sample (from left to right) ad the misclassified egative smaple, as draw i the figure. 3. (5%) Will the secod stump receive higher coefficiet i the esemble tha the first? I other words, will α 2 > α? Briefly explai your aswer. (o calculatio should be ecessary). (sol.) α 2 > α because the poit that the secod stump misclassifies will have a smaller relative weight sice it is classified correctly by the first stump. 5
6 4 Neural Nets (5%) Cosider a eural et for a biary classificatio which has oe hidde layer as show i the figure. We use a liear activatio fuctio h(z) = cz at hidde uits ad a sigmoid activatio fuctio g(z) = +e z at the output uit to lear the fuctio for P (y = x, w) where x = (x, x 2 ) ad w = (w, w 2,..., w 9 ). x w 3 w 4 bias bias w w2 w 7 w 8 x 2 w 5 w 6 w 9. (5%) What is the output P (y = x, w) from the above eural et? Express it i terms of x i, c ad weights w i. What is the fial classificatio boudary? (sol.) g(w 7 + w 8 h(w + w 3 x + w 5 x 2 ) + w 9 h(w 2 + w 4 x + w 6 x 2 )) = + exp( (w 7 + cw 8 w + cw 9 w 2 + (cw 8 w 3 + cw 9 w 4 )x + (cw 8 w 5 + cw 9 w 6 )x 2 )) The classificatio boudary is : w 7 + cw 8 w + cw 9 w 2 + (cw 8 w 3 + cw 9 w 4 )x + (cw 8 w 5 + cw 9 w 6 )x 2 = 0 2. (5%) Draw a eural et with o hidde layer which is equivalet to the give eural et, ad write weights w of this ew eural et i terms of c ad w i. (sol.) 3. (5%) Is it true that ay multi-layered eural et with liear activatio fuctios at hidde layers ca be represeted as a eural et without ay hidde layer? Briefly explai your aswer. (sol.) Yes. If liear activatio fuctios are used for all the hidde uits, output from hidde uits will be writte as liear combiatio of iput features. Sice these itermediate output serves as iput for the fial output layer, we ca always fid a equivalet eural et which does ot have ay hidde layer as see i the example above. 6
7 7
8 5 Kerel Method (20%) Suppose we have six traiig poits from two classes as i Figure (a). Note that we have four poits from class : (0.2, 0.4), (0.4, 0.8), (0.4, 0.2), (0.8, 0.4) ad two poits from class 2: (0.4, 0.4), (0.8, 0.8). Ufortuately, the poits i Figure (a) caot be separated by a liear classifier. The kerel trick is to fid a mappig of x to some feature vector φ(x) such that there is a fuctio K called kerel which satisfies K(x, x ) = φ(x) T φ(x ). Ad we expect the poits of φ(x) to be liearly separable i the feature space. Here, we cosider the followig ormalized kerel: K(x, x ) = xt x x T x. (5%) What is the feature vector φ(x) correspodig to this kerel? Draw φ(x) for each traiig poit x i Figure (b), ad specify from which poit it is mapped. φ(x) = 2. (5%) You ow see that the feature vectors are liearly separable i the feature space. The maximum-margi decisio boudary i the feature space will be a lie i R 2, which ca be writte as w x + w 2 y + c = 0. What are the values of the coefficiets w ad w 2? (Hit: you do t eed to compute them.) x x (sol.) (w, w 2 ) = (, ) 3. (3%) Circle the poits correspodig to support vectors i Figure (b). 4. (7%) Draw the decisio boudary i the origial iput space resultig from the ormalized liear kerel i Figure (a). Briefly explai your aswer. 8
9 6 VC Dimesio ad PAC Learig(0%) The VC dimesio, V C(H), of hypothesis space H defied over istace space X is the size of the largest largest umber of poits (i some cofiguratio) that ca be shattered by H. Suppose with probability ( δ), a PAC learer outputs a hypothesis withi error ɛ of the best possible hypothesis i H. It ca be show that the lower boud o the umber of traiig examples m sufficiet for successful learig, stated i terms of V C(H) is m ɛ (4 log 2(2/δ) + 8V C(H) log 2 (3/ɛ)). Cosider a learig problem i which X = R is the set of real umbers, ad the hypothesis space is the set of itervals H = {(a < x < b) a, b R}. Note that the hypothesis labels poits iside the iterval as positive, ad egative otherwise.. (5%) What is the VC dimesio of H? (sol.) V C(H) = 2. Suppose we have two poits x ad x 2, ad x < x 2. They ca always be shattered by H, o matter how they are labled. (a) if x positive ad x 2 egative, choose a < x < b < x 2 ; (b) if x egative ad x 2 positive, choose x < a < x 2 < b; (c) if both x ad x 2 positive, choose a < x < x 2 < b; (d) if both x ad x 2 egative, choose a < b < x < x 2 ; However, if we have three poits x < x 2 < x 3 ad if they are labeled as x (positive) x 2 (egative) ad x 3 (positive), the they caot be shattered by H. 2. (5%) What is the probability that a hypothesis cosistet with m examples will have error at least ɛ? (sol.) Use the above result. Substitute V C(H) = 2 ito the iequality m ɛ (4 log 2(2/δ) log 2 (3/ɛ)), we have ɛm 4 log 2 (2/δ) log 2 (3/ɛ) ɛm 6 log 2 (3/ɛ) 4 log 2 (2/δ) 2 ɛm/4 (3/ɛ) 4 2/δ δ ( 3 ) 4 ɛ 2 ɛm/4 9
10 7 Logistic Regressio (0%) We cosider the followig models of logistic regressio for a biary classificatio with a sigmoid fuctio g(z) = +e z : Model : P (Y = X, w, w 2 ) = g(w X + w 2 X 2 ) Model 2: P (Y = X, w, w 2 ) = g(w 0 + w X + w 2 X 2 ) We have three traiig examples: x () = [, ] T x (2) = [, 0] T x (3) = [0, 0] T y () = y (2) = y (3) =. (5%) Does it matter how the third example is labeled i Model? i.e., would the leared value of w = (w, w 2 ) be differet if we chage the label of the third example to -? Does it matter i Model 2? Briefly explai your aswer. (Hit: thik of the decisio boudary o 2D plae.) (sol.) It does ot matter i Model because x (3) = (0, 0) makes w x + w 2 x 2 always zero ad hece the likelihood of the model does ot deped o the value of w. But it does matter i Model (5%) Now, suppose we trai the logistic regressio model (Model 2) based o the traiig examples x (),..., x () ad labels y (),..., y () by maximizig the pealized log-likelihood of the labels: log P (y (i) x (i), w) λ 2 w 2 = log g(y (i) w T x (i) ) λ 2 w 2 i i For large λ (strog regularizatio), the log-likelihood terms will behave as liear fuctios of w. log g(y (i) w T x (i) )) 2 y(i) w T x (i) Express the pealized log-likelihood usig this approximatio (with Model ), ad derive the expressio for MLE ŵ i terms of λ ad traiig data {x (i), y (i) }. Based o this, explai how w behaves as λ icreases. (We assume each x (i) = (x (i), x(i) 2 )T ad y (i) is either or - ) (sol.) log l(w) 2 y(i) w T x (i) λ 2 w 2 i log l(w) y (i) x (i) w 2 λw = 0 i log l(w) y (i) x (i) 2 w 2 2 λw 2 = 0 i w = 2λ y (i) x (i) i 0
11
12 2
6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED
MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationMath 220B Final Exam Solutions March 18, 2002
Math 0B Fial Exam Solutios March 18, 00 1. (1 poits) (a) (6 poits) Fid the Gree s fuctio for the tilted half-plae {(x 1, x ) R : x 1 + x > 0}. For x (x 1, x ), y (y 1, y ), express your Gree s fuctio G(x,
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS
MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science. BACKGROUND EXAM September 30, 2004.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departmet of Electrical Egieerig ad Computer Sciece 6.34 Discrete Time Sigal Processig Fall 24 BACKGROUND EXAM September 3, 24. Full Name: Note: This exam is closed
More informationQuestion 1: The magnetic case
September 6, 018 Corell Uiversity, Departmet of Physics PHYS 337, Advace E&M, HW # 4, due: 9/19/018, 11:15 AM Questio 1: The magetic case I class, we skipped over some details, so here you are asked to
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationQuestions and answers, kernel part
Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationCHAPTER 5. Theory and Solution Using Matrix Techniques
A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More information6.003 Homework #3 Solutions
6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the
More informationAlternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.
0_0905.qxd //0 :7 PM Page SECTION 9.5 Alteratig Series Sectio 9.5 Alteratig Series Use the Alteratig Series Test to determie whether a ifiite series coverges. Use the Alteratig Series Remaider to approximate
More informationUniversity of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!
Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationStatistical Theory MT 2009 Problems 1: Solution sketches
Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where
More informationChapter 2 The Solution of Numerical Algebraic and Transcendental Equations
Chapter The Solutio of Numerical Algebraic ad Trascedetal Equatios Itroductio I this chapter we shall discuss some umerical methods for solvig algebraic ad trascedetal equatios. The equatio f( is said
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationReview Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn
Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom
More informationMATHEMATICAL SCIENCES PAPER-II
MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationAsymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values
of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationStatistical Theory MT 2008 Problems 1: Solution sketches
Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α
More informationMath 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions
Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x
More informationMachine Learning Assignment-1
Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationPAPER : IIT-JAM 2010
MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationIIT JAM Mathematical Statistics (MS) 2006 SECTION A
IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationCarleton College, Winter 2017 Math 121, Practice Final Prof. Jones. Note: the exam will have a section of true-false questions, like the one below.
Carleto College, Witer 207 Math 2, Practice Fial Prof. Joes Note: the exam will have a sectio of true-false questios, like the oe below.. True or False. Briefly explai your aswer. A icorrectly justified
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationProblem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =
Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,
More information( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.
Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationComplex Numbers Solutions
Complex Numbers Solutios Joseph Zoller February 7, 06 Solutios. (009 AIME I Problem ) There is a complex umber with imagiary part 64 ad a positive iteger such that Fid. [Solutio: 697] 4i + + 4i. 4i 4i
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationChapter 9: Numerical Differentiation
178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationCS 2750 Machine Learning. Lecture 22. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 22 Cocept learig Milos Hauskrecht milos@cs.pitt.edu 5329 Seott Square Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm
More informationNUMERICAL METHODS FOR SOLVING EQUATIONS
Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:
More informationCS 2750 Machine Learning. Lecture 23. Concept learning. CS 2750 Machine Learning. Concept Learning
Lecture 3 Cocept learig Milos Hauskrecht milos@cs.pitt.edu Cocept Learig Outlie: Learig boolea fuctios Most geeral ad most specific cosistet hypothesis. Mitchell s versio space algorithm Probably approximately
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More information