Maximum Likelihood Estimation and Complexity Regularization
|
|
- Darcy Sanders
- 5 years ago
- Views:
Transcription
1 ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio I the last lecture, we have iid observatios draw from a ukow distributio Y i p θ, i,, where θ Θ With loss fuctio defied as l(θ, Y i log p θ (Y i, the empirical risk is ˆ R log p θ (Y i Essetially, we wat to choose a distributio from the collectio of distributios withi the parameter space that miimizes the empirical risk,ie, we would like to select where The risk is defied as Note that θ miimizes R(θ over Θ p θ ˆ P p θ θ Θ ˆ θ arg mi θ Θ log p θ (Y i R(θ E[l(θ, Y ] E[log p θ (Y ] θ arg mi E[log p θ(y ] θ Θ arg mi log p θ (y p θ (y dy θ Θ Fially, the excess risk of θ is defied as R(θ R(θ log p θ (y p θ (y p θ (y dy K(p θ, p θ We recogized that the excess risk correspodig to this loss fuctio is simply the Kullback-Leibler (KL Divergece or Relative Etropy, deoted by K(p θ, p θ It is easy to see that K(p θ, p θ is always o-egative ad is zero if ad oly if p θ p θ KL divergece measures how differet two probability distributios are ad therefore is atural to measure covergece of the maximum likelihood procedures However, K(p θ, p θ is ot a distace metric because it is ot symmetric ad does ot satisfy the triagle iequality For this reaso, two other quatities play a key role i maximum likelihood estimatio, amely Helliger Distace ad Affiity
2 Maximum Likelihood Estimatio ad Complexity Regularizatio The Helliger distace is defied as ( ( H(p θ, p θ pθ (y p θ (y dy We proved that the squared Helliger distace lower bouds the KL divergece: H (p θ, p θ K(p θ, p θ H (p θ, p θ K(p θ, p θ The affiity is defied as we also proved that pθ A(p θ, p θ p θ (y dy H (p θ, p θ log (A(p θ, p θ Example (Gaussia Distributio Y is Gaussia with mea θ ad variace σ First, look at The, p θ (y [ K(p θ, p θ E θ log p ] θ p θ (y θ e σ πσ log p θ p θ σ [(θ θ (θ θ y] θ θ σ (θ θ σ y p θ (y dy E[Y ]θ σ (θ + θ θ θ (θ θ σ ( ( log A(p θ, p θ log (y θ πσ e / ( σ (y θ πσ e / σ dy ( log ( log (θ θ log e σ (y θ πσ e 4σ (y θ 4σ dy πσ e σ [ (y θ +θ +( θ θ ] dy (θ θ 4σ K(p θ, p θ H (p θ, p θ
3 Maximum Likelihood Estimatio ad Complexity Regularizatio 3 Maximum likelihood estimatio ad Complexity regularizatio Suppose that we have iid traiig samples, X i, Y i Usig coditioal probability, p XY ca be writte as p XY p XY (x, y p X (x p Y Xx (y Let s assume for the momet that p X is completely ukow, but p Y Xx (y has a special form: p Y Xx (y p f (x(y where p Y Xx (y is a kow parametric desity fuctio with parameter f (x Example (Sigal-plus-oise observatio model Y i f (X i + W i, i,, where W i N (0, σ ad X i p X Y X x Poisso(f (x p f (x(y (y f (x πσ e σ The likelihood loss fuctio is p f (x(y e f (x [f (x] y y! The expected loss is l(f(x, y log p XY (X, Y log p X (X log p Y X (Y X log p X (X log p f(x (Y E[l(f(X, Y ] E X [ EY X [l(f(x, Y X x] ] E X [ E Y X [ log p X (x log p f(x (Y X x] ] E X [ log p X (X ] E X [ E Y X [ log p f(x (Y X x ] ] E X [ log p X (X ] E[ log p f(x (Y ] Notice that the first term is a costat with respect to f Hece, we defie our risk to be R(f E[ log p f(x (Y ] E X [ E Y X [log p f(x (Y X x] ] ( log p f(x (y p f (x(y dy p X (x dx The fuctio f miimizes this risk sice f(x f (x miimizes the itegrad Our empirical risk is the egative log-likelihood of the traiig samples: ˆ R (f log p f(xi(y i
4 Maximum Likelihood Estimatio ad Complexity Regularizatio 4 The value is the empirical probability of observig X X i Ofte i fuctio estimatio, we have cotrol over where we sample X Let s assume that X [0, ] d ad Y R Suppose we sample X uiformly with m d samples for some positive iteger m (ie,,take m evely spaced samples i each coordiate Let x i,i,, deote these sample poits, ad assume that Y i p f (x i(y The, our empirical risk is Rˆ (f l(f(x i, Y i log p f(xi(y i Note that x i is ow a determiistic quatity Our risk is R(f E [ log p f(xi(y i ] [ ] log p f(xi(y i p f (x i(y i dy i The risk is miimized by f However, f is ot a uique miimizer Ay f that agrees with f at the poit x i, Y i also miimizes this risk Now, we will make use of the followig vector ad shorthad otatio radom variable, while the lowercase y ad x deote determiistic quatities Y y x Y y x Y Y y y x x The uppercase Y deotes a The, p f (Y p (Y i f(x i (radom p f (y p (y i f(x i (determiistic With this otatio, the empirical risk ad the true risk ca be writte as ˆ R (f log p f (Y R(f E[log p f (Y ] log p f (y p f (y dy 3 Error Boud Suppose that we have a pool of cadidate fuctios F, ad we wat to select a fuctio f from F usig the traiig data Our usual approach is to show that the distributio of R ˆ (f cocetrates about its mea as grows First, we assig a complexity c(f > 0 to each f F so that c(f The, apply the uio boud to get a uiform cocetratio iequality holdig for all models i F Fially, we use this cocetratio iequality to boud the expected risk of our selected model
5 Maximum Likelihood Estimatio ad Complexity Regularizatio 5 We will essetially accomplish the same result here, but avoid the eed for explicit cocetratio iequalities ad istead make use of the iformatio-theoretic bouds where We would like to select a f F so that the excess risk is small is agai the KL divergece 0 R(f R(f E[log p f (Y log p f (Y ] [ E log p f (Y ] p f (Y K(p f, p f ( K(p f, p f log p f (x i(y i p f(xi(y i p f (x i(y i dy i K(p f(xi,p f (xi Ufortuately, as metioed before, K(p f, p f is ot a true distace So istead we will focus o the expected squared Helliger distace as our measure of performace We will get a boud o E [ H (p f (Y, p f (Y ] ( ( p f(xi(y i p f (x i(y i dyi 4 Maximum Complexity-Regularized Likelihood Estimatio Theorem (Li-Barro 000, Kolaczyk-Nowak 00 Let x i, Y i be a radom sample of traiig data with Y i idepedet, Y i p f (x i(y i, i,, for some ukow fuctio f Suppose we have a collectio of cadidate fuctios F, ad complexities c(f > 0, f F, satisfyig c(f Defie the complexity-regularized estimator fˆ arg mi The, log p f (Y i + c(f log E [ H (p f (Y, p f (Y ] E [log (A(p f (Y, p f (Y ] mi K(p c(f log f, p f + Before provig the theorem, let s look at a special case
6 Maximum Likelihood Estimatio ad Complexity Regularizatio 6 Example 3 (Gaussia oise Suppose Y i f(x i + W i, W i N (0, σ Usig results from example, we have p f(xi(y i (y πσ e i f(x i σ ( log A p f ˆ (Y, p f (Y 4σ ( log A p f(x ˆ (Y i i, p f (x i(y i log p ˆ f (x i (y i p f (x i(y i dy i ( ˆ f (x i f (x i The, We also have, ] [log E A(p f ˆ, p f 4σ [ ( ] E f ˆ (x i f (x i Combie everythig together to get fˆ arg mi The theorem tells us that ( f ˆ (x i f (x i E 4 σ or Now let s come back to the proof K(p f, p f (f(x i f (x i σ (Y i f(x i log p f (Y σ (Y i f(x i σ + mi [ ( ] E f ˆ (x i f (x i mi c(f log (f(x i f (x i σ + c(f log (f(x i f (x i + 8σ c(f log Proof: ( H p f ˆ, p f ( pf ˆ (y p f (y dy ( log pf ˆ (y p f (y dy affiity
7 Maximum Likelihood Estimatio ad Complexity Regularizatio 7 [ ( ] E H p f ˆ, p f Now, defie the theoretical aalog of f ˆ : f arg mi E log K (p f, p f + p f ˆ (y p f (y dy c(f log Sice we ca see that fˆ arg mi log p f (Y + arg max arg max arg max arg max c(f log (log p f (Y c(f log (log p f (Y c(f log log (p f (Y e c(f log p f (Y e c(f log p f ˆ (Y e c( f ˆ log pf (Y e c(f log The ca write [ ( ] E H p f ˆ, p f E log p f ˆ (y p f (y dy p f ˆ (Y e c( ˆ f log E log pf (Y e c(f log p f ˆ p f dy Now, simply multiply the argumet iside the log by [ ( ] E H p f ˆ, p f pf (Y p f ˆ (Y E log pf (Y pf (Y e ( ] pf (Y + c(f log [ E log p f (Y p f ˆ (Y +E log pf (Y K (p f, p f + c(f log p f ˆ (Y +E log pf (Y pf (Y p f (Y to get e c( f ˆ log c(f log e c( ˆ f log p ˆ f (y p f (y dy e c( ˆ f log p ˆ f (y p f (y dy p f ˆ (y p f (y dy
8 Maximum Likelihood Estimatio ad Complexity Regularizatio 8 The terms K (p f, p f + c(f log are precisely what we wated for the upper boud of the theorem So, to fiish the proof we oly eed to show that the last term is o-positive Applyig Jese s iequality, we get p f ˆ (Y E log pf (Y e c( ˆ f log p ˆ f (y p f (y dy log E e c( f ˆ log p ˆ p ˆ f (Y p f (Y f (y p f (y dy Both Y ad f ˆ are radom, which makes the expectatio difficult to compute However, we ca simplify the problem usig the uio boud, which elimiates the depedece o f ˆ : p f ˆ (Y E log pf (Y e c( f ˆ log log E pf (Y e c(f log p f (Y p f ˆ (y p f (y dy pf (y p f (y dy [ ] pf (Y E log c(f p f (Y pf (y p f (y dy log c(f 0 where the last two lies come from [ ] p f (Y p f (y E p f (Y p f (y p f (y dy p f (y p f (y dy ad c(f
ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationLectures 12 and 13 - Complexity Penalized Maximum Likelihood Estimation
Lectures ad 3 - Complexity Pealized Maximum Likelihood Estimatio Rui Castro May 5, 03 Itroductio As you leared i previous courses, if we have a statistical model we ca ofte estimate ukow parameters by
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationSTATISTICS 593C: Spring, Model Selection and Regularization
STATISTICS 593C: Sprig, 27 Model Selectio ad Regularizatio Jo A. Weller Lecture 2 (March 29): Geeral Notatio ad Some Examples Here is some otatio ad termiology that I will try to use (more or less) systematically
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationStatistical Theory MT 2008 Problems 1: Solution sketches
Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationStatistical Theory MT 2009 Problems 1: Solution sketches
Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 7: Channel coding theorem for discrete-time continuous memoryless channel
Lecture 7: Chael codig theorem for discrete-time cotiuous memoryless chael Lectured by Dr. Saif K. Mohammed Scribed by Mirsad Čirkić Iformatio Theory for Wireless Commuicatio ITWC Sprig 202 Let us first
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationReview Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn
Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationUnbiased Estimation. February 7-12, 2008
Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom
More informationQuestions and Answers on Maximum Likelihood
Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationMATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED
MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data
ECE 6980 A Algorithmic ad Iformatio-Theoretic Toolbo for Massive Data Istructor: Jayadev Acharya Lecture # Scribe: Huayu Zhag 8th August, 017 1 Recap X =, ε is a accuracy parameter, ad δ is a error parameter.
More information1 Review and Overview
CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, 2018 1 Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw
More informationSOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker
SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5
More informationIIT JAM Mathematical Statistics (MS) 2006 SECTION A
IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 7
Statistical Machie Learig II Sprig 2017, Learig Theory, Lecture 7 1 Itroductio Jea Hoorio jhoorio@purdue.edu So far we have see some techiques for provig geeralizatio for coutably fiite hypothesis classes
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationStatistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23
18.650 Statistics for Applicatios Chapter 3: Maximum Likelihood Estimatio 1/23 Total variatio distace (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X.
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More informationThe minimum value and the L 1 norm of the Dirichlet kernel
The miimum value ad the L orm of the Dirichlet kerel For each positive iteger, defie the fuctio D (θ + ( cos θ + cos θ + + cos θ e iθ + + e iθ + e iθ + e + e iθ + e iθ + + e iθ which we call the (th Dirichlet
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationSpring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j
Sprig 206 6.44 - Iformatio Theory Midterm (take home) Due: Tue, Mar 29, 206 (i class) Prof. Y. Polyaskiy Rules. Collaboratio strictly prohibited. 2. Write rigorously, prove all claims. 3. You ca use otes
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More information4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3
Exam Problems (x. Give the series (, fid the values of x for which this power series coverges. Also =0 state clearly what the radius of covergece is. We start by settig up the Ratio Test: x ( x x ( x x
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationLecture 9: Expanders Part 2, Extractors
Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationLECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments
LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationMATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1
MATH 2: HOMEWORK 6 SOLUTIONS CA PRO JIRADILOK Problem. If s = 2, ad Problem : Rudi, Chapter 3, Problem 3. s + = 2 + s ( =, 2, 3,... ), prove that {s } coverges, ad that s < 2 for =, 2, 3,.... Proof. The
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More information