High-Dimensional Sparse Econometrics. Models, an Introduction
|
|
- Dora Hodge
- 5 years ago
- Views:
Transcription
1 High-Dimensional Sparse Econometric Models, an Introduction ICE, July 2011
2 Outline 1. High-Dimensional Sparse Econometric Models (HDSM) Models Motivating Examples for linear/nonparametric regression 2. Estimation Methods for linear/nonparametric regression l1 -penalized estimators formulations penalty parameter computational methods post-l1 -penalized estimators summary of rates of convergence 3. Plenty of model selection applications in Econometrics Results for IV Estimators Results for Quantile Regression Results for Variable Length Markov Chain
3 High-Dimensional Sparse Models We have a large number of parameters p, potentially larger than the sample size n The idea is that a low dimensional submodel captures very accurately all essential features of the full dimensional model. The key assumption is that the number of relevant parameters is smaller than the sample size fundamentally, a model selection problem One question is whether sparse models make sense. Another question is how to compute such models. Another question is what guarantees can we establish.
4 1. High-Dimensional Sparse Econometric Model HDSM. A response variable y i obeys y i = x i β 0 + ǫ i, ǫ i N(0,σ 2 ), i = 1,..., n where x i are fixed p-dimensional regressors, x i = (x ij, j = 1,..., p), E n [x 2 ij ] = 1. p possibly much larger than n. The key assumption is sparsity: p s := β 0 0 = 1{β 0j 0} n. j=1 This generalizes the standard large parametric econometric model by letting the identity T := support (β 0 ) of the relevant s regressors be unknown.
5 High p motivation transformations of basic regressors z i, x i = (P 1 (z i ),..., P p (z i )) for example, in wage regressions, Pj s are polynomials or B-splines in education and experience. or simply a very large list of regressors, a list of country characteristics in cross-country growth regressions (Barro & Lee) housing characteristics in hedonic regressions (American Housing Survey) price and product characteristics at the point of purchase (scanner data, TNS).
6 Sparseness The key assumption is that the number of non-zero regression coefficients is smaller than the sample size: s := β 0 0 = p 1{β 0j 0} n. j=1 The idea is that a low s-dimensional submodel accurately approximates the full p-dimensional model. The approximation error in fact is zero. Can easily relax this assumption and put in a non-zero approximation error. We can relax the Gaussianity of the error terms ǫ i substantially based on self-normalized moderate deviation theory results
7 High-Dimensional Sparse Econometric Model HDSM. For y i the response variable and z i the fixed elementary regressors y i = g(z i ) + ǫ i, E[ǫ i ] = 0, E[ǫ 2 i ] = σ2, i = 1,..., n Consider a p-dimensional dictionary of series terms, x i = (P 1 (z i ),..., P p (z i )) for example, B-splines, polynomials, and interactions, where p is possibly much larger than n. Expand via generalized series: g(z i ) = x i β 0 + r i, x i β 0 is the best s dimensional parametric approximation p s := β 0 0 = 1{β 0j 0} n. j=1 r i is the approximation error.
8 High-Dimensional Sparse Econometric Model HDSM. As a result we have an approximately sparse model: y i = x i β 0 + r i + ǫ i, i = 1,...,n The parametric part x i β 0 is the target, with s non-zero coefficients. We choose s to balance the approximation error with the oracle estimation error: E n [r 2 i ] σ 2 s n. This model generalizes the exact sparse model, by letting in approximation error and non-gaussian errors. This model also generalizes the standard series regression model by letting the identity T = support (β 0 ) of the most important s series terms be unknown.
9 Example 1: Series Models of Wage Function In this example, abstract away from the estimation questions. Consider a series expansion of the conditional expectation E[y i z i ] of wage y i given education z i. A conventional series approximation to the regression function is, for example, E[y i z i ] = β 1 P 1 (z i ) + β 2 P 2 (z i ) + β 3 P 3 (z i ) + β 4 P 4 (z i ) + r i where P 1,..., P 4 are first low-order polynomials or splines with a few knots.
10 Traditional Approximation of Expected Wage Function using Polynomials wage education In the figure, true regression function E[y i z i ] computed using U.S. Census data.
11 With the same number of parameters, can we a find a much better series approximation? For this to be possible, some high-order series terms in the expansion p E[y i z i ] = β 0k P k (z i ) + r i k=1 must have large coefficients. This indeed happens in our example, since E[y i z i ] exhibits sharp oscillatory behavior in some regions, which low degree polynomials fail to capture. And we find the right series terms by using l 1 -penalization methods
12 Lasso Approximation of Expected Wage Function using Polynomials and Dummies wage education
13 Traditional vs Lasso Approximation of Expected Wage Functions with Equal Number of Parameters wage education
14 Errors of Conventional and the lasso-based Sparse Approximations L 2 error L error Conventional Series Approximation lasso-based Series Approximation
15 Example 2: Cross-Country Growth Regression Relation between growth rate and initial per capita GDP GrowthRate = α 0 + α 1 log(gdp) + ǫ Test the convergence hypothesis, α 1 < 0, poor countries catch up with richer countries. Prediction from the classical Solow growth model. Linear regression only with log(gdp) yields: GrowthRate = log(gdp) + ǫ Test the convergence hypothesis conditional on covarites describing institutions and technological factors: GrowthRate = α 0 + α 1 log(gdp) + p β j X j + ǫ j=1
16 Example 2: Cross-Country Growth Regression In Barro-Lee data, we have p = 60 covariates, n = 90 observations. Need to select significant covariates among 60. Penalization Real GDP per capita (log) is in all models Parameter λ Additional Selected Variables Black Market Premium (log) Black Market Premium (log) Political Instability Black Market Premium (log) Political Instability Ratio of nominal government expenditure on defense to nominal GDP Ratio of import to GDP
17 Example 2: Cross-Country Growth Regression We find a sparse model for conditioning effects using l 1 - penalization. We re-fit the resulting model using LS. We find that the coefficient on lagged GDP is negative, and the confidence intervals, adjusted for model selection, exclude zero. Coefficient Confidence Set lagged GDP [ , ] Our findings fully support the conclusions reached in Barro and Lee and Barro and Sala-i-Martin.
18 2. Estimation via l 1 -penalty methods The canonical LS which minimizes Q(β) = E n [(y i x i β)2 ] is not consistent when dim(β) = p n. We can try minimizing an AIC/BIC type criterion function Q(β) + λ β 0, β 0 = p 1{β j 0} j=1 but this is not computationally feasible when p is large. Essentially a search over too many (non-nested) models. Technically, it is a combinatorial problem known to be NP-hard.
19 2. Estimation via l 1 -penalty methods It was required to balance and understand simultaneously computational issues }{{} hard combinatorial search and statistical issues }{{} lack of global identification Understanding that statistical aspects can help computational methods and vice-versa was crucial. Further developments in model selection estimation will rely heavily on computational advances. (MCMC is an example of a computational method with a profound impact on the statistics/econometrics literature)
20 2. Estimation via l 1 -penalty methods avoid computing: Q(β)+λ β 0 but get something meaninful A solution (Tibshirani, JRSS, 96) is to replace the l 0 - norm by the closest convex function, the l 1 -norm: β 1 = p β j j=1 lasso estimator β then minimizes Q(β) + λ β 1 this change makes the regularization function to be convex good computational methods for large scale problems many ways to combine the fit and regularization functions
21 Formulations Idea: balance fit Q(β) and penalty β 1 using a parameter λ Several ways to do it: min β Q(β) + λ β 1 lasso min β Q(β) : β 1 λ sieve min β β 1 : Q(β) λ l 1 -minimization method min β β 1 : Q(β) λ Dantzig selector min β Q(β) + λ β 1 lasso
22 Penalty parameter λ Motivation/heuristic is to choose the parameter λ so that β 0 is feasible for the problem but not too many more points. sieve: ideally set λ = β 0 1 l 1 -minimization: ideally set λ = σ 2 Dantzig selector: ideally set λ = Q(β 0 ) = 2E n [ǫ i x i ] Note that lasso and lasso are unrestricted minimizations. Their penalties are chosen to majorate the score at the true value, respectively λ > Q(β 0 ) and λ > Q(β 0 )
23 Heuristics for lasso via Convex Geometry Q(β) β T c true value (β 0T c = 0)
24 Heuristics for lasso via Convex Geometry Q(β) Q(β 0 ) + Q(β 0 ) β T c β T c true value (β 0T c = 0)
25 Heuristics for lasso via Convex Geometry λ β 1 λ = Q(β 0 ) Q(β) β T c true value (β 0T c = 0)
26 Heuristics for lasso via Convex Geometry λ β 1 λ > Q(β 0 ) Q(β) β T c true value (β 0T c = 0)
27 Heuristics via Convex Geometry Q(β) + λ β 1 λ > Q(β 0 ) β T c true value (β 0T c = 0)
28 Penalty parameters for lasso and lasso In the case of lasso, the optimal choice of penalty level λ, guaranteeing near-oracle performance, is λ =σ 2 2 log p/n P 2E n [ǫ i x i ] = Q(β 0 ). The choice relies on knowing σ, which may be apriori hard to estimate when p n. In the case of lasso the score is a pivotal random variable, with distribution independent of unknown parameters: Q(β 0 ) = E n[ǫ i x i ] = E n[(ǫ i /σ)x i ], E n [ǫ 2 i ] En [(ǫ i /σ) 2 ] hence for lasso the penalty λ does not depend on σ.
29 Computational Methods for lasso Big Picture of Computational Methods Method Problem Size Comp. Complexity Componentwise extremely large convergence ( ) First-order very large pn log p ε (up to ) Interior-point large (p + n) 3.5 log ( p+n) ε (up to 5000) Remark: Several implementations for these methods are available: SDPT3 or SeDuMi for interior-point methods TFOCS for first-order methods several downloadable codes for componentwise methods (easy to implement)
30 Computational Methods for lasso: Componentwise Componentwise Method. A common approach to solve multivariate optimization: (i) pick a component and fix all remaining components, (ii) improve the objective function along the chosen component, (iii) loop steps (i)-(iii) until convergence is achieved. This has been widely used when the minimization over a single component can be done very efficiently. This is the case of lasso. Its simple implementation also contributed for the widespread use of this approach.
31 Computational Methods for lasso: Componentwise For a current iterate β, let β j = (β 1,...,β j 1, 0,β j+1,...,β p ) : if 2E n [x ij (y i x i β j)] > λ, the minimum over β j is β j = ( 2E n [x ij (y i x i β j)] + λ ) /E n [x 2 ij ]; if 2E n [x ij (y i x i β j)] < λ the minimum over β j is β j = ( 2E n [x ij (y i x i β j)] λ ) /E n [x 2 ij ]; if 2 E n [x ij (y i x i β j)] λ the minimum over β j is β j = 0. Remark: Similar formulas can be computed for other methods like lasso.
32 Computational Methods for lasso: first-order methods First-order methods. The new generation of first-order methods focus on structured convex problems that can be cast as where min f(a(w) + b) + h(w) or min h(w) : A(w) + b K. w w f is a smooth function h is a structured function that is possibly non-differentiable or with extended values K is a structure convex set (typically a convex cone).
33 Computational Methods for lasso: first-order methods lasso is cast as with min f(a(w) + b) + h(w) w f( ) = 2 n, h( ) = λ 1, A = X, and b = Y Thus, Xβ Y 2 min + λ β 1 β n Remark: A number of different first-order methods have been proposed and analyzed. We just mention briefly one.
34 Computational Methods for lasso: first-order methods On every iteration for a given µ > 0 and current point β k is β(β k ) = arg min β 2E n [x i (y i x i βk )] β µ β βk 2 + λ n β 1. It follows that the minimization in β above is separable and can be solved by soft-thresholding, letting We update our iterate as m k j := β k j + 2E n [x ij (y i x i βk )]/µ β j (β k ) = sign ( m k j ) { } m k max j λ/µ, 0. Remark: The additional regularization term µ β β k 2 bring computational stability that allows the algorithm to enjoy better computational complexity properties.
35 Computational Methods for lasso: ipm Interior-point methods. Interior-point methods (ipms) solvers typically focus on solving conic programming problems in standard form, min w c w : Aw = b, w K. (2.1) The conic constraint will be biding at the optimal solution. These methods use a barrier function F K for the cone K that is strictly convex, smooth and F K (w) when w K, and for some µ > 0 proceed to compute w(µ) = arg min w c w + µf K (w) : Aw = b. (2.2) Letting µ 0, w(µ) converges to the solution (2.1). We can recover an approximate solution for (2.1) efficiently by using an approximate solution for w(µ k ) as a warm-start to approximate w(µ k+1 ), µ k > µ k+1.
36 Computational Methods for lasso: ipm Not all barriers are created equally, solving the problems w(µ) = arg min w c w + µf K (w) : Aw = b. (2.3) will be tractable only if F K is a special barrier (depends on K ). Three cones play a major role in optimization theory: Non-negative orthant: IR p + = {w IRp : w j 0, j = 1,...,p}, F K (w) = p j=1 log(w i) Second-order cone: Q n+1 = {(w, t) IR n IR + : t w }, F K (w) = log(t w ) Semi-definite positive matrices: S+ k k = {W IR k k : W = W, v Wv 0, for all v IR k }, F K (w) = log det(w)
37 Computational Methods for lasso: ipm These cones (and three others) are called self-scaled cones. Their cartesian products are also self-scaled. Therefore, when using ipms, it is VERY important computationally to arrive a formulation that is not only convex but also one that relies on self-scaled cones. It turns out that many tractable penalization methods (including l 1 but others as well) can be formulated with these three cones.
38 Computational Methods for lasso: ipm lasso min β Q(β) + λ β 1 to formulate the 1, let β = β + β, where β +,β IR p +. for any vector v IR n and any scalar t 0 we have that v v t is equivalent to (v,(t 1)/2) (t + 1)/2. }{{}}{{} =:u =:z min t,β +,β,u,z,v t n + λ p j=1 β + j + β j v = Y Xβ + + Xβ t = 1 + 2z t = 1 + 2u (v, u, z) Q n+2, t 0,β + IR p +, β IR p +.
39 Quick comparison on a very sparse model: s = 5, β 0j = 1 for j = 1,..., s, β 0j = 0 for j > s. Table: Average running time over 100 simulations. Different precisions across methods. n = 100, p = 500 Componentwise First-order Interior-point lasso lasso n = 200, p = 1000 Componentwise First-order Interior-point lasso lasso n = 400, p = 2000 Componentwise First-order Interior-point lasso lasso
40 Post-Model Selection Estimator Define the post-lasso estimator as follows: Step 1: Select the model using the lasso, T := support( β). Step 2: Apply ordinary least square (LS) to the selected model T. Remark: We could use lasso, Dantzig selector, sieve or l 1 -minimization instead of lasso.
41 Post-Model Selection Estimator Geometry β 2 β β 0 β 1 min β 1 : Q(β) γ, β R p In this example the correct model is selected, but note the shrinkage bias. The shrinkage bias can be removed by post model selection estimator, i.e. applying OLS to the selected model.
42 Post-Model Selection Estimator Geometry β 2 β β 0 β 1 min β 1 : Q(β) γ, β R p In this example the correct model was not selected. The selected model contains a junk regressor which can impact the post model selection estimator.
43 Issues on post model selection estimator lasso and others l 1 -penalized methods will successfully zero out lots of irrelevant regressors. however, these procedures are not perfect at model selection might include junk, exclude some relevant regressors, and these procedures bias/shrink the non-zero coefficient estimates towards zero. post model selection estimators are motivated to remove the bias towards zero but results need to account for possible misspecification in the selected model.
44 Regularity Condition A simple sufficient condition is as follows. Condition RSE. Take any C > 1. Then for all n large enough, the sc-sparse eigenvalues of the empirical design matrix E n [x i x i ], are bounded away from zero and from above; namely, 0 < κ min δ 0 sc E n [x i δ]2 δ 2 max δ 0 sc E n [x i δ]2 δ 2 κ <. (2.4) This holds under i.i.d. sampling if E[x i x i ] has eigenvalues bounded away from zero and above, and: x i has light tails (i.e., log-concave) and s log p = o(n); or bounded regressors max ij x ij K and s log 4 p = o(n).
45 Result 1: Rates for l 1 -penalized methods Theorem (Rates) Under condition RSE, wp 1 E n [x i β s log(n p) x i β 0] 2 σ n the rate is very close to the oracle rate s/n, obtainable when we know the true model T. p shows up only through a logarithmic term log p. for Dantzig selector: Candes and Tao (Annals of Statistics 2007) for lasso: Meinshausen and Yu (Annals of Statistics 2009), Bickel, Ritov, Tsybakov (Annals of Statistics 2009) for lasso: B., Chernozhukov and Wang (Biometrika, 2011)
46 Result 2: Post-Model Selection Estimator Theorem (Rate for Post-Model Selection Estimator) Under conditions of Result 1, wp 1 E n [x i β s PL x i β 0] 2 σ log(n p), n Under some further exceptional cases faster, up to σ Even though lasso does not in general perfectly select the relevant regressors, post-lasso performs at least as well. analysis account for a selected model that can be misspecified This result was first derived in the context of median regression B. and Chernozhukov (Annals of Statistics, 2011), and extended to least squares by B. and Chernozhukov (R & R Bernoulli, 2011). s n.
47 Monte Carlo In this simulation we used s = 6, p = 500, n = 100 y i = x i β 0 + ǫ i, ǫ i N(0,σ 2 ), β 0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0,...,0) x i N(0,Σ), Σ ij = (1/2) i j, σ 2 = 1 Ideal benchmark: Oracle estimator which runs OLS of y i on x i1,..., x i6. This estimator is not feasible outside Monte-Carlo.
48 Monte Carlo Results: Risk Risk: E[(x i β x β 0 ) 2 ] n = 100, p = 500 Estimation Risk LASSO Post-LASSO Oracle
49 Monte Carlo Results: Bias Bias: E[ β] β 0 2 n = 100, p = 500 Bias LASSO Post-LASSO
50 Applications of model selection
51 Gaussian Instrumental Variable Gaussian IV model y i = d i α + ǫ i, d i = D(z i ) + v i, (first stage) ( ǫi v i ) N ( ( σ 2 0, ǫ σ ǫv σ ǫv σv 2 )) can have additional controls w i in both equations; suppressed here, see references for details want to estimate optimal instrument D(z i ) = E[d i z i ] using a large list of technical instruments, x i := (x i1,..., x ip ) := (P 1 (z i ),..., P p (z i )), and we can write D(z i ) = x i β 0 + r i, with β 0 0 = s, E n [r 2 i ] σ 2 v s/n where p is large, possibly much larger than n. use estimated optimal instrument to estimate α.
52 2SLS with LASSO/Post-LASSO estimated Optimal IV 2SLS with LASSO/Post-LASSO estimated Optimal IV Step 1: Estimate D(z i ) = x i β using LASSO or Post-LASSO estimator. Step 2: α = [E n [d i D(zi ) ]] 1 E n [ D(z i )y i ]
53 2SLS with LASSO-based IV Theorem (2SLS with lasso-based IV) Under conditions of Result 1, and s 2 log 2 p = o(n), we have ( Λ ) 1/2 n( α α) d N(0, I), where Λ := σ 2 {E n [ D(z i ) D(z i ) ]} 1. For more results, see B., Chernozhukov, and Hansen (submitted, ArXiv 2010) The requirement s 2 log 2 p = o(n) can be relaxed to s log p = o(n) if split IV estimator is used. Extension to non-gaussian and heteroskedastic errors in B., Chen, Chernozhukov and Hansen (submitted, ArXiv 2010) required adjustments in the penalty function
54 Application of IV: Eminent Domain Estimate economic consequences of government take-over of property rights from individuals y i = economic outcome in a region i, e.g. housing price index d i = number of property take-overs decided in a court of law, by panels of 3 judges x i = demographic characteristics of judges, that are randomly assigned to panels: education, political affiliations, age, experience etc. f i = x i + various interactions of components of x i, a very large list p = p(f i ) = 344
55 Application of IV: Eminent Domain, Continued. Outcome is log of housing price index; endogenous variable is government take-over Can use 2 elementary instruments, suggested by real lawyers (Chen and Yeh, 2010) Can use all 344 instruments and select approximately the right set using LASSO. Estimator Instruments Price Effect Rob Std Error 2SLS SLS / LASSO IVs
56 l 1 -Penalized Quantile Regression a response variable y and p-dimensional covariates x u-th conditional quantile function of y given x is given by x β(u), β(u) R p. the dimension p is large, possibly much larger than the sample size n the true model β(u) is sparse having only s < n non-zero components; we don t know which. in population, β(u) minimizes Q u (β) = E[ρ u (y x β)], where ρ u (t) = (u 1{t 0})t = ut + + (1 u)t.
57 l 1 -Penalized Quantile Regression canonical estimator that minimizes Q u (β) = n 1 is not consistent when p n. n i=1 ρ u (y i x i β) l 1 -penalized quantile regression β(u) solves min Qu (β) + λ u β 1. β R p
58 l 1 -QR Rate Result Theorem (Rates) Under standard conditions, letting U (0, 1) be a compact set, wp 1 sup u U s log(n p) β(u) β(u) 2 P. n very similar to the uniform rate s log n/n when the support is known similar rates to lasso but also hold uniformly over u U although similar rates substantially different analysis
59 Application l 1 -QR: Conditional Independence Let X IR m denote a random vector. We are interest in determining conditional independence structure of X, for every pair of components i, j determine if X i X j X i, j. Conditional independence between two components of a Gaussian distributed random vector X N(0, Σ) has been well understood: For Σ = Var(X), X i and X j are independent conditioned on the other components if and only if Σ 1 ij = 0.
60 Application l 1 -QR: Conditional Independence Let X IR m denote a random vector. We are interest in determining conditional independence structure of X, for every pair of components i, j determine if X i X j X i, j. We consider a non-gaussian high-dimensional setting. if and only if X i X j X i, j Q Xi (u X i ) = Q Xi (u X i, j ) for all u (0, 1).
61 Application l 1 -QR: Conditional Independence In order to approximate conditional independence consider: we say that X i and X j are U-quantile uncorrelated if β i (u) arg min E[ρ u (X i β X i )] and β j (u) arg min E[ρ u (X j β X j )] are such that βj i(u) = 0 and βj i (u) = 0 for all u U.
62 Application l 1 -QR: Conditional Independence In order to approximate conditional independence consider: U (0, 1) a compact set of quantile indices a class of functions F We will say that if and only if for every u U X i U,F X j X i, j f i,j u (0, X i, j ) arg min f F E[ρ u(x i f(x j, X i, j ))] and f j,i u (0, X i, j ) arg min f F E[ρ u(x j f(x i, X i, j ))] This approach requires: series approximation for F; uniform estimation across all quantile indices u U; model selection.
63 Application l 1 -QR: Conditional Independence Let X be a vector of log returns. Edge (i, j) if X i X j X i, j FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
64 Application l 1 -QR: Conditional Independence Consider IBM returns first. FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
65 Application l 1 -QR: Conditional Independence Next consider all returns. FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
66 Application l 1 -QR: Conditional Independence Next we investigate how this dependence structure is affected by downside movement of the market. This is particulary relevant for hedging since during downside movements of the market is when hedge is most needed. We quantify downside movement by conditioning on the observations with the largest 25%, 50%, 75% and 100% downside movement of the market. (The 100% corresponds to the overall dependence.)
67 Application l 1 -QR: Conditional Independence 100% (No downside movement) FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
68 Application l 1 -QR: Conditional Independence 75% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
69 Application l 1 -QR: Conditional Independence 50% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
70 Application l 1 -QR: Conditional Independence 25% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL
71 Application l 1 -QR: Conditional Independence 100% vs. 25% FUND FUND KDE TW KDE TW OSIP HGR OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF SUNW ORCL BTFG AEPI JJSF PLXS LLTC PLXS LLTC DELL DELL The right graph is much sparser than the left graph. Thus, when markets are moving down several variables become (nearly) conditionally independent. Therefore, hedging strategies that do not accounting for downside movements might not offer the desired protection exactly when it is needed the most.
72 Variable Length Markov Chains Consider a dynamic model: dynamic discrete choice models discrete stochastic dynamic programming The main feature we are interest: history of outcomes potentially impacts the probability distribution of the next outcome
73 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Typical assumption: Markov Chain of order 1 (2 cond. prob.) t = 0 t = 1 L H
74 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order 2 requires to estimate 4 cond. prob. t = 0 t = 1 L H t = 2 LL LH HL HH
75 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order 3 requires to estimate 8 cond. prob. t = 0 t = 1 t = 2 LL L H LH HL HH t = 3 LLL LLH LHL LHH HLL HLH HHL HHH
76 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order K requires to estimate 2 K cond. prob. t = 0 t = 1 t = 2 t = 3 LLL LLH t = K.. LL L H LH LHL LHH.. HL HLL. HLH. HH HHL. HHH.
77 Variable Length Markov Chains Issues: K is unknown (potentially infinite) full K -order Markov chain becomes large quickly similar histories may lead to similar transition probabilities Motivation: bias (small model) vs. variance (large model) make the order of the Markov Chain depend on the history model selection balances: misspecification and length of confidence intervals
78 Variable Length Markov Chains Variable Length Markov Chain: very flexible but need to select the context tree t = 0 t = 1 t = 2 LL LH t = 3 LHL LHH. L H HL HH
79 Variable Length Markov Chains A penalty method based on (uniform) confidence intervals. Uniform estimation of transition probabilities over all possible histories x A 1 : sup P( x) P( x) x A 1 (bounds achieve the minimax rates for some classes) Results allow to establish: in dynamic discrete choice models: uniform estimation of dynamic marginal effects in discrete stochastic dynamic programming: uniform estimation of the value function
80 Conclusion
81 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n
82 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates
83 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates 3. post-l 1 -penalized estimators achieve at least near-oracle rates
84 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates 3. post-l 1 -penalized estimators achieve at least near-oracle rates 4. applications of model selection in econometrics estimators
85 References Bickel, Ritov and Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, Candes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n, Annals of Statistics, Meinshausen and Yu, Lasso-type recovery of sparse representations for high-dimensional data, Annals of Statistics, Tibshirani, Regression shrinkage and selection via the Lasso, J. Roy. Statist. Soc. Ser. B, 1996.
86 References Belloni and Chernozhukov, l 1 -Penalized Quantile Regression for High Dimensional Sparse Models, 2007, Annals of Statistics. Belloni and Chernozhukov, Post-l 1 -Penalized Estimators in High-Dimensional Linear Regression Models, ArXiv, Belloni, Chernozhukov and Wang, LASSO: Pivotal Recovery of Sparse Signals via Conic Programming, Biometrika, Belloni and Chernozhukov, High Dimensional Sparse Econometric Models: An Introduction, Ill-Posed Inverse Problems and High-Dimensional Estimation, Springer Lecture Notes in Statistics, Belloni, Chernozhukov, and Hansen, Estimation and Inference Methods for High-dimensional Sparse Econometric Models: a Review, 2011, submitted. Belloni and Oliveira, Approximate Group Context Tree: application to dynamic programming and dynamic choice models, 2011, submitted.
Ultra High Dimensional Variable Selection with Endogenous Variables
1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High
More informationEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse Models (p much larger than n) Victor Chernozhukov Christian Hansen NBER, July 2013 Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2.
More informationEconometrics of Big Data: Large p Case
Econometrics of Big Data: Large p Case (p much larger than n) Victor Chernozhukov MIT, October 2014 VC Econometrics of Big Data: Large p Case 1 Outline Plan 1. High-Dimensional Sparse Framework The Framework
More informationHigh Dimensional Sparse Econometric Models: An Introduction
High Dimensional Sparse Econometric Models: An Introduction Alexandre Belloni and Victor Chernozhukov Abstract In this chapter we discuss conceptually high dimensional sparse econometric models as well
More informationarxiv: v1 [stat.me] 31 Dec 2011
INFERENCE FOR HIGH-DIMENSIONAL SPARSE ECONOMETRIC MODELS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN arxiv:1201.0220v1 [stat.me] 31 Dec 2011 Abstract. This article is about estimation and inference methods
More informationMostly Dangerous Econometrics: How to do Model Selection with Inference in Mind
Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Econometrics: How to do Model Selection with Inference in Mind June 25, 2015,
More informationProgram Evaluation with High-Dimensional Data
Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference
More informationSPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN
SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN A. BELLONI, D. CHEN, V. CHERNOZHUKOV, AND C. HANSEN Abstract. We develop results for the use of LASSO and Post-LASSO
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationIV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade
IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September
More informationINFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction
INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationLASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS. 1. Introduction
LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN Abstract. In this note, we propose to use sparse methods (e.g. LASSO, Post-LASSO, LASSO, and Post- LASSO)
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationThe risk of machine learning
/ 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationLecture 24 May 30, 2018
Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationHIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS
HIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN The goal of many empirical papers in economics is to provide an estimate of the causal
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationA Guide to Modern Econometric:
A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction
More informationGenerated Covariates in Nonparametric Estimation: A Short Review.
Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationarxiv: v5 [stat.me] 18 Dec 2011
SQUARE-ROOT LASSO: PIVOTAL RECOVERY OF SPARSE SIGNALS VIA CONIC PROGRAMMING arxiv:009.5689v5 [stat.me] 8 Dec 0 ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV AND LIE WANG Abstract. We propose a pivotal method
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationINFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT
INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT DAMIAN KOZBUR Abstract. This paper provides inference results for series estimators with a high dimensional component. In conditional
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationThreshold Autoregressions and NonLinear Autoregressions
Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationThe Instability of Correlations: Measurement and the Implications for Market Risk
The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationData with a large number of variables relative to the sample size highdimensional
Journal of Economic Perspectives Volume 28, Number 2 Spring 2014 Pages 1 23 High-Dimensional Methods and Inference on Structural and Treatment Effects Alexandre Belloni, Victor Chernozhukov, and Christian
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationCentral Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.
Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLinear Models in Econometrics
Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationBest subset selection via bi-objective mixed integer linear programming
Best subset selection via bi-objective mixed integer linear programming Hadi Charkhgard a,, Ali Eshragh b a Department of Industrial and Management Systems Engineering, University of South Florida, Tampa,
More informationHeriot-Watt University. Two-Step Lasso Estimation of the Spatial Weights Matrix Ahrens, Achim; Bhattacharjee, Arnab. Heriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway wo-step Lasso Estimation of the Spatial Weights Matrix Ahrens, Achim; Bhattacharjee, Arnab Published in: Econometrics DOI: 0.3390/econometrics3008
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationProblem Set 7. Ideally, these would be the same observations left out when you
Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations
More informationVariable selection and estimation in high-dimensional models
Variable selection and estimation in high-dimensional models Joel L. Horowitz Department of Economics, Northwestern University Abstract. Models with high-dimensional covariates arise frequently in economics
More informationLecture 9: Quantile Methods 2
Lecture 9: Quantile Methods 2 1. Equivariance. 2. GMM for Quantiles. 3. Endogenous Models 4. Empirical Examples 1 1. Equivariance to Monotone Transformations. Theorem (Equivariance of Quantiles under Monotone
More informationMotivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.
Econometrics I Department of Economics Universidad Carlos III de Madrid Master in Industrial Economics and Markets Outline Motivation 1 Motivation 2 3 4 5 Motivation Hansen's contributions GMM was developed
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationPhD/MA Econometrics Examination January 2012 PART A
PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationIdentification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity
Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationTopic 4: Model Specifications
Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More information