Machine Learning Brett Bernstein
|
|
- Jasmine Knight
- 5 years ago
- Views:
Transcription
1 Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio has margial distributio X Uif{1,..., 10}. Furthermore, assume Y = X (i.e., Y always has the exact same value as X). I the questios below we use square loss fuctio l(a, x) = (a x) 2. (a) What is the Bayes risk? (b) What is the approximatio error whe usig the hypothesis space of costat fuctios? (c) Suppose we use the hypothesis space F of affie fuctios. i. What is the approximatio error? ii. Cosider the fuctio ˆf(x) = x + 1. Compute R( ˆf) R(f F ). (a) The best decisio fuctio is f (x) = x. The associated risk is 0. (b) The best costat fuctio is f(x) = E[Y ] = E[X] = 5.5. This has risk E[(Y 5.5) 2 ] = Var(Y ) = 33 4, by usig (or derivig) the formula for the variace of a discrete uiform distributio. Thus the approximatio error is 33/4. (c) i. The Bayes decisio fuctio is affie, so the approximatio error is 0. ii. The risk is Thus the aswer is 1. R( ˆf) = E[(Y ˆf(X)) 2 ] = E[(X (X + 1)) 2 ] = ( ) Let X = [ 10, 10], Y = A = R ad suppose the data distributio has margial distributio X Uif( 10, 10) ad Y X = x N (a + bx, 1). Throughout we assume the square loss fuctio l(a, x) = (a x) 2. (a) What is the Bayes risk? 1
2 (b) What is the approximatio error whe usig the hypothesis space of costat fuctios (i terms of a ad b)? (c) Suppose we use the hypothesis space of affie fuctios. i. What is the approximatio error? ii. Suppose you have a fixed data set ad compute the empirical risk miimizer ˆf (x) = c + dx. What is the estimatio error (it terms of a, b, c, d)? Throughout we use the fact that Var(X) = E[X 2 ] E[X] 2. (a) The best decisio fuctio is f(x) = E[Y X = x] = a + bx. This has risk E[(Y a bx) 2 ] = E[E[(Y a bx) 2 X]] = E[1] = 1. (b) The best costat fuctio is give by E[Y ] = E[E[Y X]] = a + be[x] = a. This has risk where E[(Y a) 2 ] = E[E[(Y a) 2 X]] = E[1 + b 2 X 2 ] = 1 + b 2 E[X 2 ], E[X 2 ] = Thus the approximatio error is 100b 2 /3. x dx = = (c) i. There is a affie Bayes decisio fuctio, so the approximatio error is 0. ii. Note that R( ˆf ) = E[(Y c dx) 2 ] = E[E[(Y c dx) 2 X]] = E[1 + ((a c) + (b d)x) 2 ] = 1 + (a c) (b d) 2 /3. Thus the estimatio error is (a c) (b d) 2 /3. 3. Try to best characterize each of the followig i terms of oe or more of optimizatio error, approximatio error, ad estimatio error. (a) Overfittig. (b) Uderfittig. (c) Precise empirical risk miimizatio for your hypothesis space is computatioally itractable. (d) Not eough data. (a) High estimatio error due to isufficiet data relative to the complexity of your hypothesis space. Ca be accompaied by low approximatio error idicatig a complex hypothesis space. 2
3 (b) High approximatio error due to a overly simplistic hypothesis space. Ca be accompaied by low estimatio estimatio error due to the large amout of data relative to the (low) complexity of the hypothesis space. (c) Icreased optimizatio error. (d) High estimatio error. 4. (a) We sometimes look at R( ˆf ) as radom, ad other times as determiistic. What causes this differece? (b) True or False: Icreasig the size of our hypothesis space ca shift risk from approximatio error to estimatio error but always leaves the quatity R( ˆf ) R(f ) costat. (c) True or False: Assume we treat our data set as a radom sample ad ot a fixed quatity. The the estimatio error ad the approximatio error are radom ad ot determiistic. (d) True or False: The empirical risk of the ERM, ˆR( ˆf ), is a ubiased estimator of the risk of the ERM R( ˆf ). (e) I each of the followig situatios, there is a implicit sample space i which the give expectatio is computed. Give that space. i. Whe we say the empirical risk ˆR(f) is a ubiased estimator of the risk R(f) (where f is idepedet of the traiig data used to compute the empirical risk). ii. Whe we compute the expected empirical risk E[R( ˆf )] (i.e., the outer expectatio). iii. Whe we say the miibatch gradiet is a ubiased estimator of the full traiig set gradiet. (a) The quatity is radom whe we cosider the traiig data as a radom sample of size. If we focus o a fixed set of traiig data the the quatity is determiistic. (b) False. Note that ˆf depeds o which hypothesis space you have chose. As a example, imagie havig a affie Bayes decisio fuctio, ad chagig the hypothesis space from the set of affie fuctios to the set of all decisio fuctios. This ca cause empirical risk miimizatio to overfit the traiig data thus creatig a sharp rise i R( ˆf ) R(f ). (c) False, approximatio error is a determiistic quatity. (d) False. The empirical risk of the ERM will ofte be biased low. This is why we use a test set to approximate its true risk. The issue is that ˆf depeds o the traiig data so El( ˆf (x i ), y i ) El( ˆf (x), y) 3
4 (e) where x, y is a ew radom draw from the data distributio that is t i the traiig data. i. The space of traiig sets (i.e., samples of size from the data geeratig distributio). ii. The space of traiig sets (i.e., samples of size from the data geeratig distributio). iii. The space of all miibatches chose from the full traiig set (i.e., samples of of the batch size from the empirical distributio o the full traiig set). 5. For each, use,, or = to determie the relatioship betwee the two quatities, or if the relatioship caot be determied. Throughout assume F 1, F 2 are hypothesis spaces with F 1 F 2, ad assume we are workig with a fixed loss fuctio l. (a) The estimatio errors of two decisio fuctios f 1, f 2 that miimize the empirical risk over the same hypothesis space, where f 2 uses 5 extra data poits. (b) The approximatio errors of the two decisio fuctios f 1, f 2 that miimize risk with respect to F 1, F 2, respectively (i.e., f 1 = f F1 ad f 2 = f F2 ). (c) The empirical risks of two decisio fuctios f 1, f 2 that miimize the empirical risk over F 1, F 2, respectively. Both use the same fixed traiig data. (d) The estimatio errors (for F 1, F 2, respectively) of two decisio fuctios f 1, f 2 that miimize the empirical risk over F 1, F 2, respectively. (e) The risk of two decisio fuctios f 1, f 2 that miimize the empirical risk over F 1, F 2, respectively. (a) Roughly speakig, more data is better, so we would ted to expect that f 2 will have lower estimatio error. That said, this is ot always the case, so the relatioship caot be determied. (b) The approximatio error of f 1 will be larger. (c) The empirical risk of f 1 will be larger. (d) Roughly speakig, icreasig the hypothesis space should icrease the estimatio error sice the approximatio error will decrease, ad we expect to eed more data. That said, this is ot always the case, so the aswer is the relatioship caot be determied. (e) Caot be determied. 6. I the excess risk decompositio lecture, we itroduced the decisio tree classifier spaces F (space of all decisio trees) ad F d (the space of decisio trees of depth d) ad wet through some examples. The followig questios are based o those slides. Recall that P X = Uif([0, 1] 2 ), Y = {blue, orage}, orage occurs with.9 probability below the lie y = x ad blue occurs with.9 probability above the lie y = x. 4
5 (a) Prove that the Bayes error rate is 0.1. (b) Is the Bayes decisio fuctio i F? (c) For the hypothesis space F 3 the slide states that R( f) = 0.176±.004 for = Assumig you had access to the traiig code that produces f from a set of data poits, ad radom draws from the data geeratig distributio, give a algorithm (pseudocode) to compute (or estimate) the values ad.004. (a) Sice the output space is discrete ad we are usig the 0 1 loss, our best predictio is the highest probability output coditioal o the iput. By choosig orage below the lie y = x ad blue above, we obtai a.1 probability of error. For the 0 1 loss, probability of error gives the risk. (b) No. Ay decisio tree i F has fiite depth, ad thus will divide [0, 1] 2 ito a fiite umber of rectagles. Thus we caot produce the decisio boudary y = x used by the Bayes decisio fuctio. (c) Pseudocode follows: i. Iitialize L to be a empty list of risks. ii. Repeat the followig M times for some sufficietly large M: A. Draw a radom sample (x 1, y 1 ),..., (x, y ) from the data geeratig distributio. B. Obtai a decisio fuctio f by ruig our traiig algorithm o the geerated sample. C. Draw a ew radom sample (x 1, y 1),..., (x S, y S ) of size S where S is sufficietly large. D. Compute e = {i f(x i ) y i}. That is, the umber of times f is icorrect o our ew sample. E. Add e/s to the list L. iii. Compute the sample average ad stadard deviatio of the values i L. Above.176 would be the average ad.004 would be the stadard deviatio. Istead of drawig the sample of size S we could have computed the risk aalytically. L 1 ad L 2 Regularizatio 1. Cosider the followig two miimizatio problems: arg mi Ω(w) + λ w L(f w (x i ), y i ) 5
6 ad arg mi CΩ(w) + 1 w L(f w (x i ), y i ), where Ω(w) is the pealty fuctio (for regularizatio) ad L is the loss fuctio. Give sufficiet coditios uder which these two give the same miimizer. Let C = 1/λ. The the two objectives differ by a costat factor. 2. ( ) Let f : R R be a differetiable fuctio. Prove that f(x) 2 L if ad oly if f is Lipschitz with costat L. First suppose f(x) 2 L for some L 0 ad all x R. By the mea value theorem we have, for ay x, y R, f(y) f(x) = f(x + ξ(y x)) T (y x), where ξ is some value betwee 0 ad 1. Takig absolute values o each side we have f(y) f(x) = f(x + ξ(y x)) T (y x) f(x + ξ(y x)) 2 y x 2 by Cauchy-Schwarz. Applyig our boud o the gradiet orm proves f is Lipschitz with costat L. Coversely, suppose f is Lipschitz with costat L. Note that f(x) T v = f (x; v) = lim f(x + tv) f(x) t 0 t lim t L v t 0 t Lettig v = f(x) we obtai f(x) 2 2 L f(x) 2 givig the result. 3. ( ) Let ŵ deote the miimizer for = L v. miimize w Xw y 2 2 subject to w 1 r. Prove that f(x) = ŵ T x is Lipschitz with costat r. Note that w 2 w 1 r, so the argumet from class gives the result. To see the iequality, ote that w 2 1 = ( w w ) 2 w w 2 = w Two of the plots i the lecture slides use the fact that ˆβ / β is always betwee 0 ad 1. Here ˆβ is the parameter vector of the liear model resultig from the regularized least squares problem. Aalgously, β is the parameter vector from the uregularized problem. Why is this true that the quotiet lies i [0, 1]? 6
7 We assume Ivaov regularizatio (sice Tikhoov is equivalet). We kow that 1 ( β T x i y i ) 2 1 ( ˆβ T x i y i ) 2 sice β is the solutio to the ucostraied miimizatio. But if β ˆβ the β is feasible for the regularized problem, so ˆβ = β. Thus β ˆβ. 5. Explai why feature ormalizatio is importat if you are usig L 1 or L 2 regularizatio. Suppose you have a model y = w T x where x 1 is a very correlated with y, but the feature is measured i meters. Thus w 1 = 4 would mea each icrease i x 1 by 1 meter yields a icrease i y by 4. Now suppose we chage the uits of w 1 to kilometers by scalig it. This would require us to chage w 1 to 4000 to achieve the same decisio fuctio. While this has o effect o the loss (y w T x) 2 it has a sigificat effect o λ w 2 2 or λ w 1. For example, eve if x 2,..., x had very little relatioship with y, we would still udervalue w 1 due to the regularizatio. 7
Machine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationChapter 2 The Monte Carlo Method
Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationChapter 9: Numerical Differentiation
178 Chapter 9: Numerical Differetiatio Numerical Differetiatio Formulatio of equatios for physical problems ofte ivolve derivatives (rate-of-chage quatities, such as velocity ad acceleratio). Numerical
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationUnbiased Estimation. February 7-12, 2008
Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom
More informationSTAT Homework 2 - Solutions
STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationLecture 6: Coupon Collector s problem
Radomized Algorithms Lecture 6: Coupo Collector s problem Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 1 / 16 Variace: key features
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationAlternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.
0_0905.qxd //0 :7 PM Page SECTION 9.5 Alteratig Series Sectio 9.5 Alteratig Series Use the Alteratig Series Test to determie whether a ifiite series coverges. Use the Alteratig Series Remaider to approximate
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationSequences. A Sequence is a list of numbers written in order.
Sequeces A Sequece is a list of umbers writte i order. {a, a 2, a 3,... } The sequece may be ifiite. The th term of the sequece is the th umber o the list. O the list above a = st term, a 2 = 2 d term,
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationTime-Domain Representations of LTI Systems
2.1 Itroductio Objectives: 1. Impulse resposes of LTI systems 2. Liear costat-coefficiets differetial or differece equatios of LTI systems 3. Bloc diagram represetatios of LTI systems 4. State-variable
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationSection 13.3 Area and the Definite Integral
Sectio 3.3 Area ad the Defiite Itegral We ca easily fid areas of certai geometric figures usig well-kow formulas: However, it is t easy to fid the area of a regio with curved sides: METHOD: To evaluate
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationLecture Notes 15 Hypothesis Testing (Chapter 10)
1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we
More informationLecture 6: Integration and the Mean Value Theorem. slope =
Math 8 Istructor: Padraic Bartlett Lecture 6: Itegratio ad the Mea Value Theorem Week 6 Caltech 202 The Mea Value Theorem The Mea Value Theorem abbreviated MVT is the followig result: Theorem. Suppose
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationSince X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain
Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the
More informationClases 7-8: Métodos de reducción de varianza en Monte Carlo *
Clases 7-8: Métodos de reducció de variaza e Mote Carlo * 9 de septiembre de 27 Ídice. Variace reductio 2. Atithetic variates 2 2.. Example: Uiform radom variables................ 3 2.2. Example: Tail
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationMath 312 Lecture Notes One Dimensional Maps
Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,
More informationDISTRIBUTION LAW Okunev I.V.
1 DISTRIBUTION LAW Okuev I.V. Distributio law belogs to a umber of the most complicated theoretical laws of mathematics. But it is also a very importat practical law. Nothig ca help uderstad complicated
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More information5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY
IA Probability Let Term 5 INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY 51 Iequalities Suppose that X 0 is a radom variable takig o-egative values ad that c > 0 is a costat The P X c E X, c is
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More information