Lecture 24: Variable selection in linear models
|
|
- Suzan Perkins
- 5 years ago
- Views:
Transcription
1 Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet is 0. Variable or model selectio refers to elimiatig covariates colums of Z correspodig to zero compoets of β. Example 1. Liear regressio models A = a subset of {1,...,p}, idices of ozero compoets of β The dimesio of A is dima = q p β A : sub-vector of β with idices i A Z A : the correspodig sub-matrix of Z The umber of models could be as large as 2 p Approximatio to a respose surface The ith row of Z A = 1,t i,ti 2,...,ti h, t i R A = {1,...,h}: a polyomial of order h h = 0,1,...,p UW-Madiso Statistics Stat 709 Lecture / 15
2 Example 2. 1-mea vs p-mea = pr, p = p, r = r There are p groups, each has r idetically distributed observatios Select oe model from two models 1-mea model: all groups have the same mea µ 1 p-mea model: p groups have differet meas µ 1,..., µ p A = A 1 or A p Z = 1 r r 1 r r r β = Z Ap = Z β Ap = β Z A1 = 1 β A1 = µ 1 µ 1 µ 2 µ 1 µ p µ 1 I traditioal studies, p is fixed ad is large, or p/ is small I moder applicatios, both p ad are large, ad i some cases p >, p/ UW-Madiso Statistics Stat 709 Lecture / 15
3 Methods for variable selectio Geeralized Iformatio Criterio GIC Put a pealty o the dimesio of the parameter: We miimize X Z A β A 2 + λ σ 2 dimβ A over A, to obtai a suitable A, ad the estimate β A. σ 2 is a suitable estimator of the error variace σ 2 The term X Z A β A 2 measures goodess-of-fit of model A, whereas the term λ σ 2 dimβ A cotrols the size" of A. If λ = 2, this is the C p method, ad close to the AIC If λ = log, this is close to the BIC Regularizatio or pealized optimizatio simultaeously select variables ad estimate θ by miimizig X Z β 2 + p λ β, where p λ is a pealty fuctio idexed by the pealty parameter λ 0, which may deped o ad data. Zero compoets of β are estimated as zeros ad automatically elimiated. UW-Madiso Statistics Stat 709 Lecture / 15
4 Examples of pealty fuctios Ridge regressio: p λ β = λ β 2 ; LASSO least absolute shrikage ad selectio operator: p λ β = λ β 1 = λ p j=1 β j, β j is the jth compoet of β; Adaptive LASSO: p λ β = λ p j=1 τ j β j, where τ j s are o-egative leverage factors chose adaptively such that large pealties are used for uimportat β j s ad small pealties for importat oes; Elastic et: p λ β = λ 1 β 1 + λ 2 β 2 ; Miimax cocave pealty: p λ β = p j=1 aλ β j + /a for some a > 0; SCAD smoothly clipped absolute deviatio: p λ β = p j=1 λ{iβ j λ + aλ β j + a 1λ Iβ j λ} for some a > 2; There are also may modified versios of the previously listed methods. Resamplig methods Cross validatio, bootstrap Thresholdig Compare β j with a threshold may deped o ad data ad elimiate estimates that are smaller tha the threshold. UW-Madiso Statistics Stat 709 Lecture / 15
5 Assessmet of variable/model selectio procedures A = the set cotaiig exactly idices of ozero compoets of β A : a set of variables/model selected based o a selectio procedure The selectio procedure is selectio cosistet if lim P A = A = 1 Sometimes the followig weaker versio of cosistecy is desired. Uder model A, µ = EX Z is estimated by µ A = Z A βa We wat to miimize the squared error loss L A = 1 µ µ A 2 over A which is equivalet to miimizig the average predictio error [ ] 1 E X µ A 2 X,Z over A X : a future idepedet copy of X The selectio procedure is loss cosistet if L A /L A p 1 UW-Madiso Statistics Stat 709 Lecture / 15
6 Cosistecy of the GIC Let M deote a set of idices model. If A M, the M is a correct model; otherwise, M is a wrog model. The loss uder model M is equal to L M = M + ε τ H M ε/ H M = Z M Z τ M Z M 1 Z τ M, M = µ H M µ 2 / 0 if M is correct Let Γ,λ M = 1 [ X Z M β M 2 + λ σ 2 dimβ M ] to be miimized X Z M β M 2 = X H M X 2 = µ H M µ + ε H M ε 2 = M + ε 2 ε τ H M ε + 2ε τ I H M µ Whe M is a wrog model, Γ,λ M = ε 2 = ε 2 + M ετ H M ε + λ σ 2 dimm M + O P λdimm + L L M M + O P + O P = ε 2 + L M + o P L M UW-Madiso Statistics Stat 709 Lecture / 15
7 provided that lim if mi M > 0 ad λp M is wrog 0 The first coditio impies that wrog is always worse tha correct Amog all wrog M, miimizig Γ,λ M is asymptotically the same as miimizig L M Hece, the GIC is loss cosistet whe all models are wrog The GIC selects the best wrog model, i.e., the best approximatio to a correct model i terms of M, the leadig term i the loss L M For correct models, however, α = 0 ad L M = ε τ H M ε/ Correct models are ested, ad A has the smallest dimesio ad Γ,λ M = ε 2 ε τ H A ε = mi M is correct ετ H M ε ετ H M ε + λ σ 2 dimm = ε 2 + L M + λ σ 2 dimm 2ετ H M ε UW-Madiso Statistics Stat 709 Lecture / 15
8 If λ, the domiatig term i Γ,λ M is λ σ 2 dimm /. Amog correct models, the GIC selects a model by miimizig dimm, i.e., it selects A. Combiig the results, we showed that the GIC is selectio cosistet. O the other had, if λ = 2 the C p method, AIC, the term 2 σ 2 dimm 2ετ H M ε is of the same order as L M = ε τ H M ε/ uless dimm for all but oe correct model. Uder some coditios, the GIC with λ = 2 is loss cosistet if ad oly if there does ot exist two correct models with fixed dimesios. Coclusio 1 The GIC with a bouded λ C p, AIC is loss cosistet whe there is at most oe fixed-dimesio correct model; otherwise it is icosistet. 2 The GIC with λ ad λp/ 0 BIC are selectio cosistet or loss cosistet. UW-Madiso Statistics Stat 709 Lecture / 15
9 Example 2. 1-mea vs p-mea A 1 vs A p always correct p groups, each with r observatios A 1 = p j=1 µ j µ 2 /p, µ = p j=1 µ j/p = p r meas that either p or r 1. p = p is fixed ad r The dimesios of correct models are fixed The GIC with λ ad λ/ 0 is selectio cosistet The GIC with λ = 2 is icosistet 2. p ad r = r is fixed Oly oe correct model has a fixed dimesio The GIC with λ = 2 is loss cosistet The GIC with λ is icosistet, because λp / = λ/r 3. p ad r Oly oe correct model has a fixed dimesio The GIC is selectio cosistet, provided that λ/r 0 UW-Madiso Statistics Stat 709 Lecture / 15
10 More o the case where p ad r = r is fixed σ 2 = SA p /, SA = X Z A β A 2. It ca be show that L A 1 = A 1 + ē 2 1 p = lim p p L A p = 1 p p i=1 p j=1 ē 2 i p σ 2 r µ j 1 p p 2 µ i i=1 where e ij s are iid, Ee ij = 0, Ee 2 ij = σ 2, ē i = r 1 r j=1 e ij, ad ē = p 1 p i=1 ēi. The L A 1 L A p r p σ 2 The oe-mea model is better if ad oly if r < σ 2. The wrog model may be better! The GIC with λ miimizes SA 1 + λ SA p p ad SA p + λ r SA p p UW-Madiso Statistics Stat 709 Lecture / 15
11 Because SA 1 SA p = A = 1 p r i=1 j=1 p r i=1 j=1 e ij ē 2 p + σ 2 e ij ēi 2 r 1σ 2 p r ad λ /r, P{GIC with λ selects A 1 } 1 O the other had, the C p GIC with λ = 2 is loss cosistet, because the C p miimizes SA SA 1 SA p p + 2 ad SA p + 2 r SA p p p + σ 2, SA p p SA p + 2 SA p r p p σ 2 + σ 2 r Asymptotically, the C p selects A 1 iff < σ 2 /r, which is the same as the oe-mea model is better. UW-Madiso Statistics Stat 709 Lecture / 15
12 Variable selectio by thresholdig Ca we do variable selectio usig p-values? Or, ca we simply select variables by usig the values β j, j = 1,...,p? Here β j is the jth compoet of β, the least squares estimator of β. For simplicity, assume that X Z NZ β,σ 2 I. The β j β j = l ij ε i Z N 0,σ 2 lij 2 i=1 i=1 where ε i ad l ij are the ith compoets of ε = X Z β ad Z τ Z 1 z i z j is the jth row of Z Because 2π 1 Φt e t2 /2, t > 0 t where Φ is the stadard ormal cdf, P β j β j > t var β Z Z 2 2π e t2 /2, t > 0 t Let J j be the p-vector whose jth compoet is 1 ad other compoets are 0: lij 2 = [Jj τ Z τ Z 1 z i ] 2 Jj τ Z τ Z 1 J j zi τ Z τ Z 1 z i UW-Madiso Statistics Stat 709 Lecture / 15
13 i=1 l 2 ij c j i=1 z τ i Z τ Z 1 z i = pc j p/η where c j is the jth diagoal elemet of Z τ Z 1 ad η is the smallest eigevalue of Z τ Z. Thus, for ay j, P β j β j > tσ p/η Z 2 2π e t2 /2, t > 0 t ad lettig t = a /σ p/η P β j β j > a Z Ce a2 η /2σ 2 p for some costat C > 0, P max β j β j > a Z j=1,...,p pce a2 η /2σ 2 p Suppose that p/ 0 ad p/η log 0 typically, η = O. The, we ca choose a such that a 0 ad aη 2 log/p such that UW-Madiso Statistics Stat 709 Lecture / 15
14 P max β j β j > ca Z j=1,...,p for ay c > 0 ad some s 1; e.g., p a = M η log = O s for some costats M > 0 ad α 0, 1 2. What ca we coclude from this? Let A = {j : β j 0} ad A = {j : βj > a } That is, A cotais the idices of variables we select by thresholdig β j at a. Selectio cosistecy: P A A Z P β j > a,j A Z + P β j a,j A Z α The first term o the right had side is bouded by P max β j β j > a Z = O s j=1,...,p UW-Madiso Statistics Stat 709 Lecture / 15
15 O the other had, if we assume that mi j A β j c 0 a for some c 0 > 1, the P β j a,j A Z P β j β j β j a,j A Z P c 0 a β j β j a,j A Z P max β j β j c 0 1a Z j=1,...,p = O s Hece, we have cosistecy; i fact, the covergece rate is O s. We ca also obtai similar results by thresholdig β j / i=1 l2 ij. This approach may ot work if p/ 0. If p >, the Z τ Z is ot of full rak. There exist several other approaches for the case where p > ; e.g., we replace Z τ Z 1 by some matrix, or use ridge regressio istead of LSE. UW-Madiso Statistics Stat 709 Lecture / 15
Lecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationLecture 20: Multivariate convergence and the Central Limit Theorem
Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece
More informationLecture 23: Minimal sufficiency
Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationAsymptotic Results for the Linear Regression Model
Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationRank tests and regression rank scores tests in measurement error models
Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 8: Convergence of transformations and law of large numbers
Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationRegularization with the Smooth-Lasso procedure
Regularizatio with the Smooth-Lasso procedure Mohamed Hebiri To cite this versio: Mohamed Hebiri. Regularizatio with the Smooth-Lasso procedure. 2008. HAL Id: hal-00260816 https://hal.archives-ouvertes.fr/hal-00260816v2
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationA Risk Comparison of Ordinary Least Squares vs Ridge Regression
Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer
More informationTechnical Proofs for Homogeneity Pursuit
Techical Proofs for Homogeeity Pursuit bstract This is the supplemetal material for the article Homogeeity Pursuit, submitted for publicatio i Joural of the merica Statistical ssociatio. B Proofs B. Proof
More informationQuantile regression with multilayer perceptrons.
Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer
More informationA Note on Adaptive Group Lasso
A Note o Adaptive Group Lasso Hasheg Wag ad Chelei Leg Pekig Uiversity & Natioal Uiversity of Sigapore July 7, 2006. Abstract Group lasso is a atural extesio of lasso ad selects variables i a grouped maer.
More information1 Last time: similar and diagonalizable matrices
Last time: similar ad diagoalizable matrices Let be a positive iteger Suppose A is a matrix, v R, ad λ R Recall that v a eigevector for A with eigevalue λ if v ad Av λv, or equivaletly if v is a ozero
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More informationChapter 3: Other Issues in Multiple regression (Part 1)
Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationRandom assignment with integer costs
Radom assigmet with iteger costs Robert Parviaie Departmet of Mathematics, Uppsala Uiversity P.O. Box 480, SE-7506 Uppsala, Swede robert.parviaie@math.uu.se Jue 4, 200 Abstract The radom assigmet problem
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationAN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION
Statistica Siica 7(1997), 221-264 AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION Ju Shao Uiversity of Wiscosi Abstract: I the problem of selectig a liear model to approximate the true ukow regressio model,
More informationSTA6938-Logistic Regression Model
Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve
More informationAn Introduction to Asymptotic Theory
A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationChapter 1 Simple Linear Regression (part 6: matrix version)
Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,
More informationSTATISTICS 593C: Spring, Model Selection and Regularization
STATISTICS 593C: Sprig, 27 Model Selectio ad Regularizatio Jo A. Weller Lecture 2 (March 29): Geeral Notatio ad Some Examples Here is some otatio ad termiology that I will try to use (more or less) systematically
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationLinearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation
Liearly Idepedet Sets Bases p p c c p Review { v v vp} A set of vectors i a vector space is said to be liearly idepedet if the vector equatio cv + c v + + c has oly the trivial solutio = = { v v vp} The
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationExercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).
Assigmet 7 Exercise 4.3 Use the Cotiuity Theorem to prove the Cramér-Wold Theorem, Theorem 4.12. Hit: a X d a X implies that φ a X (1) φ a X(1). Sketch of solutio: As we poited out i class, the oly tricky
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationECE534, Spring 2018: Solutions for Problem Set #2
ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationx iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.
27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid
More informationCEU Department of Economics Econometrics 1, Problem Set 1 - Solutions
CEU Departmet of Ecoomics Ecoometrics, Problem Set - Solutios Part A. Exogeeity - edogeeity The liear coditioal expectatio (CE) model has the followig form: We would like to estimate the effect of some
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationMA Advanced Econometrics: Properties of Least Squares Estimators
MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationLecture 18: Sampling distributions
Lecture 18: Samplig distributios I may applicatios, the populatio is oe or several ormal distributios (or approximately). We ow study properties of some importat statistics based o a radom sample from
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More informationMath 778S Spectral Graph Theory Handout #3: Eigenvalues of Adjacency Matrix
Math 778S Spectral Graph Theory Hadout #3: Eigevalues of Adjacecy Matrix The Cartesia product (deoted by G H) of two simple graphs G ad H has the vertex-set V (G) V (H). For ay u, v V (G) ad x, y V (H),
More informationEigenvalues and Eigenvectors
5 Eigevalues ad Eigevectors 5.3 DIAGONALIZATION DIAGONALIZATION Example 1: Let. Fid a formula for A k, give that P 1 1 = 1 2 ad, where Solutio: The stadard formula for the iverse of a 2 2 matrix yields
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationLECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)
LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED
MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas
More information( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2
82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationLocal Polynomial Regression
Local Polyomial Regressio Joh Hughes October 2, 2013 Recall that the oparametric regressio model is Y i f x i ) + ε i, where f is the regressio fuctio ad the ε i are errors such that Eε i 0. The Nadaraya-Watso
More informationEconomics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity
Ecoomics 326 Methods of Empirical Research i Ecoomics Lecture 8: The asymptotic variace of OLS ad heteroskedasticity Hiro Kasahara Uiversity of British Columbia December 24, 204 Asymptotic ormality I I
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More information