Lecture 24: Variable selection in linear models

Size: px
Start display at page:

Download "Lecture 24: Variable selection in linear models"

Transcription

1 Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet is 0. Variable or model selectio refers to elimiatig covariates colums of Z correspodig to zero compoets of β. Example 1. Liear regressio models A = a subset of {1,...,p}, idices of ozero compoets of β The dimesio of A is dima = q p β A : sub-vector of β with idices i A Z A : the correspodig sub-matrix of Z The umber of models could be as large as 2 p Approximatio to a respose surface The ith row of Z A = 1,t i,ti 2,...,ti h, t i R A = {1,...,h}: a polyomial of order h h = 0,1,...,p UW-Madiso Statistics Stat 709 Lecture / 15

2 Example 2. 1-mea vs p-mea = pr, p = p, r = r There are p groups, each has r idetically distributed observatios Select oe model from two models 1-mea model: all groups have the same mea µ 1 p-mea model: p groups have differet meas µ 1,..., µ p A = A 1 or A p Z = 1 r r 1 r r r β = Z Ap = Z β Ap = β Z A1 = 1 β A1 = µ 1 µ 1 µ 2 µ 1 µ p µ 1 I traditioal studies, p is fixed ad is large, or p/ is small I moder applicatios, both p ad are large, ad i some cases p >, p/ UW-Madiso Statistics Stat 709 Lecture / 15

3 Methods for variable selectio Geeralized Iformatio Criterio GIC Put a pealty o the dimesio of the parameter: We miimize X Z A β A 2 + λ σ 2 dimβ A over A, to obtai a suitable A, ad the estimate β A. σ 2 is a suitable estimator of the error variace σ 2 The term X Z A β A 2 measures goodess-of-fit of model A, whereas the term λ σ 2 dimβ A cotrols the size" of A. If λ = 2, this is the C p method, ad close to the AIC If λ = log, this is close to the BIC Regularizatio or pealized optimizatio simultaeously select variables ad estimate θ by miimizig X Z β 2 + p λ β, where p λ is a pealty fuctio idexed by the pealty parameter λ 0, which may deped o ad data. Zero compoets of β are estimated as zeros ad automatically elimiated. UW-Madiso Statistics Stat 709 Lecture / 15

4 Examples of pealty fuctios Ridge regressio: p λ β = λ β 2 ; LASSO least absolute shrikage ad selectio operator: p λ β = λ β 1 = λ p j=1 β j, β j is the jth compoet of β; Adaptive LASSO: p λ β = λ p j=1 τ j β j, where τ j s are o-egative leverage factors chose adaptively such that large pealties are used for uimportat β j s ad small pealties for importat oes; Elastic et: p λ β = λ 1 β 1 + λ 2 β 2 ; Miimax cocave pealty: p λ β = p j=1 aλ β j + /a for some a > 0; SCAD smoothly clipped absolute deviatio: p λ β = p j=1 λ{iβ j λ + aλ β j + a 1λ Iβ j λ} for some a > 2; There are also may modified versios of the previously listed methods. Resamplig methods Cross validatio, bootstrap Thresholdig Compare β j with a threshold may deped o ad data ad elimiate estimates that are smaller tha the threshold. UW-Madiso Statistics Stat 709 Lecture / 15

5 Assessmet of variable/model selectio procedures A = the set cotaiig exactly idices of ozero compoets of β A : a set of variables/model selected based o a selectio procedure The selectio procedure is selectio cosistet if lim P A = A = 1 Sometimes the followig weaker versio of cosistecy is desired. Uder model A, µ = EX Z is estimated by µ A = Z A βa We wat to miimize the squared error loss L A = 1 µ µ A 2 over A which is equivalet to miimizig the average predictio error [ ] 1 E X µ A 2 X,Z over A X : a future idepedet copy of X The selectio procedure is loss cosistet if L A /L A p 1 UW-Madiso Statistics Stat 709 Lecture / 15

6 Cosistecy of the GIC Let M deote a set of idices model. If A M, the M is a correct model; otherwise, M is a wrog model. The loss uder model M is equal to L M = M + ε τ H M ε/ H M = Z M Z τ M Z M 1 Z τ M, M = µ H M µ 2 / 0 if M is correct Let Γ,λ M = 1 [ X Z M β M 2 + λ σ 2 dimβ M ] to be miimized X Z M β M 2 = X H M X 2 = µ H M µ + ε H M ε 2 = M + ε 2 ε τ H M ε + 2ε τ I H M µ Whe M is a wrog model, Γ,λ M = ε 2 = ε 2 + M ετ H M ε + λ σ 2 dimm M + O P λdimm + L L M M + O P + O P = ε 2 + L M + o P L M UW-Madiso Statistics Stat 709 Lecture / 15

7 provided that lim if mi M > 0 ad λp M is wrog 0 The first coditio impies that wrog is always worse tha correct Amog all wrog M, miimizig Γ,λ M is asymptotically the same as miimizig L M Hece, the GIC is loss cosistet whe all models are wrog The GIC selects the best wrog model, i.e., the best approximatio to a correct model i terms of M, the leadig term i the loss L M For correct models, however, α = 0 ad L M = ε τ H M ε/ Correct models are ested, ad A has the smallest dimesio ad Γ,λ M = ε 2 ε τ H A ε = mi M is correct ετ H M ε ετ H M ε + λ σ 2 dimm = ε 2 + L M + λ σ 2 dimm 2ετ H M ε UW-Madiso Statistics Stat 709 Lecture / 15

8 If λ, the domiatig term i Γ,λ M is λ σ 2 dimm /. Amog correct models, the GIC selects a model by miimizig dimm, i.e., it selects A. Combiig the results, we showed that the GIC is selectio cosistet. O the other had, if λ = 2 the C p method, AIC, the term 2 σ 2 dimm 2ετ H M ε is of the same order as L M = ε τ H M ε/ uless dimm for all but oe correct model. Uder some coditios, the GIC with λ = 2 is loss cosistet if ad oly if there does ot exist two correct models with fixed dimesios. Coclusio 1 The GIC with a bouded λ C p, AIC is loss cosistet whe there is at most oe fixed-dimesio correct model; otherwise it is icosistet. 2 The GIC with λ ad λp/ 0 BIC are selectio cosistet or loss cosistet. UW-Madiso Statistics Stat 709 Lecture / 15

9 Example 2. 1-mea vs p-mea A 1 vs A p always correct p groups, each with r observatios A 1 = p j=1 µ j µ 2 /p, µ = p j=1 µ j/p = p r meas that either p or r 1. p = p is fixed ad r The dimesios of correct models are fixed The GIC with λ ad λ/ 0 is selectio cosistet The GIC with λ = 2 is icosistet 2. p ad r = r is fixed Oly oe correct model has a fixed dimesio The GIC with λ = 2 is loss cosistet The GIC with λ is icosistet, because λp / = λ/r 3. p ad r Oly oe correct model has a fixed dimesio The GIC is selectio cosistet, provided that λ/r 0 UW-Madiso Statistics Stat 709 Lecture / 15

10 More o the case where p ad r = r is fixed σ 2 = SA p /, SA = X Z A β A 2. It ca be show that L A 1 = A 1 + ē 2 1 p = lim p p L A p = 1 p p i=1 p j=1 ē 2 i p σ 2 r µ j 1 p p 2 µ i i=1 where e ij s are iid, Ee ij = 0, Ee 2 ij = σ 2, ē i = r 1 r j=1 e ij, ad ē = p 1 p i=1 ēi. The L A 1 L A p r p σ 2 The oe-mea model is better if ad oly if r < σ 2. The wrog model may be better! The GIC with λ miimizes SA 1 + λ SA p p ad SA p + λ r SA p p UW-Madiso Statistics Stat 709 Lecture / 15

11 Because SA 1 SA p = A = 1 p r i=1 j=1 p r i=1 j=1 e ij ē 2 p + σ 2 e ij ēi 2 r 1σ 2 p r ad λ /r, P{GIC with λ selects A 1 } 1 O the other had, the C p GIC with λ = 2 is loss cosistet, because the C p miimizes SA SA 1 SA p p + 2 ad SA p + 2 r SA p p p + σ 2, SA p p SA p + 2 SA p r p p σ 2 + σ 2 r Asymptotically, the C p selects A 1 iff < σ 2 /r, which is the same as the oe-mea model is better. UW-Madiso Statistics Stat 709 Lecture / 15

12 Variable selectio by thresholdig Ca we do variable selectio usig p-values? Or, ca we simply select variables by usig the values β j, j = 1,...,p? Here β j is the jth compoet of β, the least squares estimator of β. For simplicity, assume that X Z NZ β,σ 2 I. The β j β j = l ij ε i Z N 0,σ 2 lij 2 i=1 i=1 where ε i ad l ij are the ith compoets of ε = X Z β ad Z τ Z 1 z i z j is the jth row of Z Because 2π 1 Φt e t2 /2, t > 0 t where Φ is the stadard ormal cdf, P β j β j > t var β Z Z 2 2π e t2 /2, t > 0 t Let J j be the p-vector whose jth compoet is 1 ad other compoets are 0: lij 2 = [Jj τ Z τ Z 1 z i ] 2 Jj τ Z τ Z 1 J j zi τ Z τ Z 1 z i UW-Madiso Statistics Stat 709 Lecture / 15

13 i=1 l 2 ij c j i=1 z τ i Z τ Z 1 z i = pc j p/η where c j is the jth diagoal elemet of Z τ Z 1 ad η is the smallest eigevalue of Z τ Z. Thus, for ay j, P β j β j > tσ p/η Z 2 2π e t2 /2, t > 0 t ad lettig t = a /σ p/η P β j β j > a Z Ce a2 η /2σ 2 p for some costat C > 0, P max β j β j > a Z j=1,...,p pce a2 η /2σ 2 p Suppose that p/ 0 ad p/η log 0 typically, η = O. The, we ca choose a such that a 0 ad aη 2 log/p such that UW-Madiso Statistics Stat 709 Lecture / 15

14 P max β j β j > ca Z j=1,...,p for ay c > 0 ad some s 1; e.g., p a = M η log = O s for some costats M > 0 ad α 0, 1 2. What ca we coclude from this? Let A = {j : β j 0} ad A = {j : βj > a } That is, A cotais the idices of variables we select by thresholdig β j at a. Selectio cosistecy: P A A Z P β j > a,j A Z + P β j a,j A Z α The first term o the right had side is bouded by P max β j β j > a Z = O s j=1,...,p UW-Madiso Statistics Stat 709 Lecture / 15

15 O the other had, if we assume that mi j A β j c 0 a for some c 0 > 1, the P β j a,j A Z P β j β j β j a,j A Z P c 0 a β j β j a,j A Z P max β j β j c 0 1a Z j=1,...,p = O s Hece, we have cosistecy; i fact, the covergece rate is O s. We ca also obtai similar results by thresholdig β j / i=1 l2 ij. This approach may ot work if p/ 0. If p >, the Z τ Z is ot of full rak. There exist several other approaches for the case where p > ; e.g., we replace Z τ Z 1 by some matrix, or use ridge regressio istead of LSE. UW-Madiso Statistics Stat 709 Lecture / 15

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 20: Multivariate convergence and the Central Limit Theorem Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Rank tests and regression rank scores tests in measurement error models

Rank tests and regression rank scores tests in measurement error models Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Regularization with the Smooth-Lasso procedure

Regularization with the Smooth-Lasso procedure Regularizatio with the Smooth-Lasso procedure Mohamed Hebiri To cite this versio: Mohamed Hebiri. Regularizatio with the Smooth-Lasso procedure. 2008. HAL Id: hal-00260816 https://hal.archives-ouvertes.fr/hal-00260816v2

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Technical Proofs for Homogeneity Pursuit

Technical Proofs for Homogeneity Pursuit Techical Proofs for Homogeeity Pursuit bstract This is the supplemetal material for the article Homogeeity Pursuit, submitted for publicatio i Joural of the merica Statistical ssociatio. B Proofs B. Proof

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

A Note on Adaptive Group Lasso

A Note on Adaptive Group Lasso A Note o Adaptive Group Lasso Hasheg Wag ad Chelei Leg Pekig Uiversity & Natioal Uiversity of Sigapore July 7, 2006. Abstract Group lasso is a atural extesio of lasso ad selects variables i a grouped maer.

More information

1 Last time: similar and diagonalizable matrices

1 Last time: similar and diagonalizable matrices Last time: similar ad diagoalizable matrices Let be a positive iteger Suppose A is a matrix, v R, ad λ R Recall that v a eigevector for A with eigevalue λ if v ad Av λv, or equivaletly if v is a ozero

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Chapter 3: Other Issues in Multiple regression (Part 1)

Chapter 3: Other Issues in Multiple regression (Part 1) Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Random assignment with integer costs

Random assignment with integer costs Radom assigmet with iteger costs Robert Parviaie Departmet of Mathematics, Uppsala Uiversity P.O. Box 480, SE-7506 Uppsala, Swede robert.parviaie@math.uu.se Jue 4, 200 Abstract The radom assigmet problem

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION

AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION Statistica Siica 7(1997), 221-264 AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION Ju Shao Uiversity of Wiscosi Abstract: I the problem of selectig a liear model to approximate the true ukow regressio model,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Chapter 1 Simple Linear Regression (part 6: matrix version)

Chapter 1 Simple Linear Regression (part 6: matrix version) Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,

More information

STATISTICS 593C: Spring, Model Selection and Regularization

STATISTICS 593C: Spring, Model Selection and Regularization STATISTICS 593C: Sprig, 27 Model Selectio ad Regularizatio Jo A. Weller Lecture 2 (March 29): Geeral Notatio ad Some Examples Here is some otatio ad termiology that I will try to use (more or less) systematically

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Linearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation

Linearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation Liearly Idepedet Sets Bases p p c c p Review { v v vp} A set of vectors i a vector space is said to be liearly idepedet if the vector equatio cv + c v + + c has oly the trivial solutio = = { v v vp} The

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1). Assigmet 7 Exercise 4.3 Use the Cotiuity Theorem to prove the Cramér-Wold Theorem, Theorem 4.12. Hit: a X d a X implies that φ a X (1) φ a X(1). Sketch of solutio: As we poited out i class, the oly tricky

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

ECE534, Spring 2018: Solutions for Problem Set #2

ECE534, Spring 2018: Solutions for Problem Set #2 ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0. 27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid

More information

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions CEU Departmet of Ecoomics Ecoometrics, Problem Set - Solutios Part A. Exogeeity - edogeeity The liear coditioal expectatio (CE) model has the followig form: We would like to estimate the effect of some

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

MA Advanced Econometrics: Properties of Least Squares Estimators

MA Advanced Econometrics: Properties of Least Squares Estimators MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Lecture 18: Sampling distributions

Lecture 18: Sampling distributions Lecture 18: Samplig distributios I may applicatios, the populatio is oe or several ormal distributios (or approximately). We ow study properties of some importat statistics based o a radom sample from

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

Math 778S Spectral Graph Theory Handout #3: Eigenvalues of Adjacency Matrix

Math 778S Spectral Graph Theory Handout #3: Eigenvalues of Adjacency Matrix Math 778S Spectral Graph Theory Hadout #3: Eigevalues of Adjacecy Matrix The Cartesia product (deoted by G H) of two simple graphs G ad H has the vertex-set V (G) V (H). For ay u, v V (G) ad x, y V (H),

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors 5 Eigevalues ad Eigevectors 5.3 DIAGONALIZATION DIAGONALIZATION Example 1: Let. Fid a formula for A k, give that P 1 1 = 1 2 ad, where Solutio: The stadard formula for the iverse of a 2 2 matrix yields

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Local Polynomial Regression

Local Polynomial Regression Local Polyomial Regressio Joh Hughes October 2, 2013 Recall that the oparametric regressio model is Y i f x i ) + ε i, where f is the regressio fuctio ad the ε i are errors such that Eε i 0. The Nadaraya-Watso

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity Ecoomics 326 Methods of Empirical Research i Ecoomics Lecture 8: The asymptotic variace of OLS ad heteroskedasticity Hiro Kasahara Uiversity of British Columbia December 24, 204 Asymptotic ormality I I

More information

2.2. Central limit theorem.

2.2. Central limit theorem. 36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard

More information