High-Dimensional Sparse Econometrics. Models, an Introduction

Size: px
Start display at page:

Download "High-Dimensional Sparse Econometrics. Models, an Introduction"

Transcription

1 High-Dimensional Sparse Econometric Models, an Introduction ICE, July 2011

2 Outline 1. High-Dimensional Sparse Econometric Models (HDSM) Models Motivating Examples for linear/nonparametric regression 2. Estimation Methods for linear/nonparametric regression l1 -penalized estimators formulations penalty parameter computational methods post-l1 -penalized estimators summary of rates of convergence 3. Plenty of model selection applications in Econometrics Results for IV Estimators Results for Quantile Regression Results for Variable Length Markov Chain

3 High-Dimensional Sparse Models We have a large number of parameters p, potentially larger than the sample size n The idea is that a low dimensional submodel captures very accurately all essential features of the full dimensional model. The key assumption is that the number of relevant parameters is smaller than the sample size fundamentally, a model selection problem One question is whether sparse models make sense. Another question is how to compute such models. Another question is what guarantees can we establish.

4 1. High-Dimensional Sparse Econometric Model HDSM. A response variable y i obeys y i = x i β 0 + ǫ i, ǫ i N(0,σ 2 ), i = 1,..., n where x i are fixed p-dimensional regressors, x i = (x ij, j = 1,..., p), E n [x 2 ij ] = 1. p possibly much larger than n. The key assumption is sparsity: p s := β 0 0 = 1{β 0j 0} n. j=1 This generalizes the standard large parametric econometric model by letting the identity T := support (β 0 ) of the relevant s regressors be unknown.

5 High p motivation transformations of basic regressors z i, x i = (P 1 (z i ),..., P p (z i )) for example, in wage regressions, Pj s are polynomials or B-splines in education and experience. or simply a very large list of regressors, a list of country characteristics in cross-country growth regressions (Barro & Lee) housing characteristics in hedonic regressions (American Housing Survey) price and product characteristics at the point of purchase (scanner data, TNS).

6 Sparseness The key assumption is that the number of non-zero regression coefficients is smaller than the sample size: s := β 0 0 = p 1{β 0j 0} n. j=1 The idea is that a low s-dimensional submodel accurately approximates the full p-dimensional model. The approximation error in fact is zero. Can easily relax this assumption and put in a non-zero approximation error. We can relax the Gaussianity of the error terms ǫ i substantially based on self-normalized moderate deviation theory results

7 High-Dimensional Sparse Econometric Model HDSM. For y i the response variable and z i the fixed elementary regressors y i = g(z i ) + ǫ i, E[ǫ i ] = 0, E[ǫ 2 i ] = σ2, i = 1,..., n Consider a p-dimensional dictionary of series terms, x i = (P 1 (z i ),..., P p (z i )) for example, B-splines, polynomials, and interactions, where p is possibly much larger than n. Expand via generalized series: g(z i ) = x i β 0 + r i, x i β 0 is the best s dimensional parametric approximation p s := β 0 0 = 1{β 0j 0} n. j=1 r i is the approximation error.

8 High-Dimensional Sparse Econometric Model HDSM. As a result we have an approximately sparse model: y i = x i β 0 + r i + ǫ i, i = 1,...,n The parametric part x i β 0 is the target, with s non-zero coefficients. We choose s to balance the approximation error with the oracle estimation error: E n [r 2 i ] σ 2 s n. This model generalizes the exact sparse model, by letting in approximation error and non-gaussian errors. This model also generalizes the standard series regression model by letting the identity T = support (β 0 ) of the most important s series terms be unknown.

9 Example 1: Series Models of Wage Function In this example, abstract away from the estimation questions. Consider a series expansion of the conditional expectation E[y i z i ] of wage y i given education z i. A conventional series approximation to the regression function is, for example, E[y i z i ] = β 1 P 1 (z i ) + β 2 P 2 (z i ) + β 3 P 3 (z i ) + β 4 P 4 (z i ) + r i where P 1,..., P 4 are first low-order polynomials or splines with a few knots.

10 Traditional Approximation of Expected Wage Function using Polynomials wage education In the figure, true regression function E[y i z i ] computed using U.S. Census data.

11 With the same number of parameters, can we a find a much better series approximation? For this to be possible, some high-order series terms in the expansion p E[y i z i ] = β 0k P k (z i ) + r i k=1 must have large coefficients. This indeed happens in our example, since E[y i z i ] exhibits sharp oscillatory behavior in some regions, which low degree polynomials fail to capture. And we find the right series terms by using l 1 -penalization methods

12 Lasso Approximation of Expected Wage Function using Polynomials and Dummies wage education

13 Traditional vs Lasso Approximation of Expected Wage Functions with Equal Number of Parameters wage education

14 Errors of Conventional and the lasso-based Sparse Approximations L 2 error L error Conventional Series Approximation lasso-based Series Approximation

15 Example 2: Cross-Country Growth Regression Relation between growth rate and initial per capita GDP GrowthRate = α 0 + α 1 log(gdp) + ǫ Test the convergence hypothesis, α 1 < 0, poor countries catch up with richer countries. Prediction from the classical Solow growth model. Linear regression only with log(gdp) yields: GrowthRate = log(gdp) + ǫ Test the convergence hypothesis conditional on covarites describing institutions and technological factors: GrowthRate = α 0 + α 1 log(gdp) + p β j X j + ǫ j=1

16 Example 2: Cross-Country Growth Regression In Barro-Lee data, we have p = 60 covariates, n = 90 observations. Need to select significant covariates among 60. Penalization Real GDP per capita (log) is in all models Parameter λ Additional Selected Variables Black Market Premium (log) Black Market Premium (log) Political Instability Black Market Premium (log) Political Instability Ratio of nominal government expenditure on defense to nominal GDP Ratio of import to GDP

17 Example 2: Cross-Country Growth Regression We find a sparse model for conditioning effects using l 1 - penalization. We re-fit the resulting model using LS. We find that the coefficient on lagged GDP is negative, and the confidence intervals, adjusted for model selection, exclude zero. Coefficient Confidence Set lagged GDP [ , ] Our findings fully support the conclusions reached in Barro and Lee and Barro and Sala-i-Martin.

18 2. Estimation via l 1 -penalty methods The canonical LS which minimizes Q(β) = E n [(y i x i β)2 ] is not consistent when dim(β) = p n. We can try minimizing an AIC/BIC type criterion function Q(β) + λ β 0, β 0 = p 1{β j 0} j=1 but this is not computationally feasible when p is large. Essentially a search over too many (non-nested) models. Technically, it is a combinatorial problem known to be NP-hard.

19 2. Estimation via l 1 -penalty methods It was required to balance and understand simultaneously computational issues }{{} hard combinatorial search and statistical issues }{{} lack of global identification Understanding that statistical aspects can help computational methods and vice-versa was crucial. Further developments in model selection estimation will rely heavily on computational advances. (MCMC is an example of a computational method with a profound impact on the statistics/econometrics literature)

20 2. Estimation via l 1 -penalty methods avoid computing: Q(β)+λ β 0 but get something meaninful A solution (Tibshirani, JRSS, 96) is to replace the l 0 - norm by the closest convex function, the l 1 -norm: β 1 = p β j j=1 lasso estimator β then minimizes Q(β) + λ β 1 this change makes the regularization function to be convex good computational methods for large scale problems many ways to combine the fit and regularization functions

21 Formulations Idea: balance fit Q(β) and penalty β 1 using a parameter λ Several ways to do it: min β Q(β) + λ β 1 lasso min β Q(β) : β 1 λ sieve min β β 1 : Q(β) λ l 1 -minimization method min β β 1 : Q(β) λ Dantzig selector min β Q(β) + λ β 1 lasso

22 Penalty parameter λ Motivation/heuristic is to choose the parameter λ so that β 0 is feasible for the problem but not too many more points. sieve: ideally set λ = β 0 1 l 1 -minimization: ideally set λ = σ 2 Dantzig selector: ideally set λ = Q(β 0 ) = 2E n [ǫ i x i ] Note that lasso and lasso are unrestricted minimizations. Their penalties are chosen to majorate the score at the true value, respectively λ > Q(β 0 ) and λ > Q(β 0 )

23 Heuristics for lasso via Convex Geometry Q(β) β T c true value (β 0T c = 0)

24 Heuristics for lasso via Convex Geometry Q(β) Q(β 0 ) + Q(β 0 ) β T c β T c true value (β 0T c = 0)

25 Heuristics for lasso via Convex Geometry λ β 1 λ = Q(β 0 ) Q(β) β T c true value (β 0T c = 0)

26 Heuristics for lasso via Convex Geometry λ β 1 λ > Q(β 0 ) Q(β) β T c true value (β 0T c = 0)

27 Heuristics via Convex Geometry Q(β) + λ β 1 λ > Q(β 0 ) β T c true value (β 0T c = 0)

28 Penalty parameters for lasso and lasso In the case of lasso, the optimal choice of penalty level λ, guaranteeing near-oracle performance, is λ =σ 2 2 log p/n P 2E n [ǫ i x i ] = Q(β 0 ). The choice relies on knowing σ, which may be apriori hard to estimate when p n. In the case of lasso the score is a pivotal random variable, with distribution independent of unknown parameters: Q(β 0 ) = E n[ǫ i x i ] = E n[(ǫ i /σ)x i ], E n [ǫ 2 i ] En [(ǫ i /σ) 2 ] hence for lasso the penalty λ does not depend on σ.

29 Computational Methods for lasso Big Picture of Computational Methods Method Problem Size Comp. Complexity Componentwise extremely large convergence ( ) First-order very large pn log p ε (up to ) Interior-point large (p + n) 3.5 log ( p+n) ε (up to 5000) Remark: Several implementations for these methods are available: SDPT3 or SeDuMi for interior-point methods TFOCS for first-order methods several downloadable codes for componentwise methods (easy to implement)

30 Computational Methods for lasso: Componentwise Componentwise Method. A common approach to solve multivariate optimization: (i) pick a component and fix all remaining components, (ii) improve the objective function along the chosen component, (iii) loop steps (i)-(iii) until convergence is achieved. This has been widely used when the minimization over a single component can be done very efficiently. This is the case of lasso. Its simple implementation also contributed for the widespread use of this approach.

31 Computational Methods for lasso: Componentwise For a current iterate β, let β j = (β 1,...,β j 1, 0,β j+1,...,β p ) : if 2E n [x ij (y i x i β j)] > λ, the minimum over β j is β j = ( 2E n [x ij (y i x i β j)] + λ ) /E n [x 2 ij ]; if 2E n [x ij (y i x i β j)] < λ the minimum over β j is β j = ( 2E n [x ij (y i x i β j)] λ ) /E n [x 2 ij ]; if 2 E n [x ij (y i x i β j)] λ the minimum over β j is β j = 0. Remark: Similar formulas can be computed for other methods like lasso.

32 Computational Methods for lasso: first-order methods First-order methods. The new generation of first-order methods focus on structured convex problems that can be cast as where min f(a(w) + b) + h(w) or min h(w) : A(w) + b K. w w f is a smooth function h is a structured function that is possibly non-differentiable or with extended values K is a structure convex set (typically a convex cone).

33 Computational Methods for lasso: first-order methods lasso is cast as with min f(a(w) + b) + h(w) w f( ) = 2 n, h( ) = λ 1, A = X, and b = Y Thus, Xβ Y 2 min + λ β 1 β n Remark: A number of different first-order methods have been proposed and analyzed. We just mention briefly one.

34 Computational Methods for lasso: first-order methods On every iteration for a given µ > 0 and current point β k is β(β k ) = arg min β 2E n [x i (y i x i βk )] β µ β βk 2 + λ n β 1. It follows that the minimization in β above is separable and can be solved by soft-thresholding, letting We update our iterate as m k j := β k j + 2E n [x ij (y i x i βk )]/µ β j (β k ) = sign ( m k j ) { } m k max j λ/µ, 0. Remark: The additional regularization term µ β β k 2 bring computational stability that allows the algorithm to enjoy better computational complexity properties.

35 Computational Methods for lasso: ipm Interior-point methods. Interior-point methods (ipms) solvers typically focus on solving conic programming problems in standard form, min w c w : Aw = b, w K. (2.1) The conic constraint will be biding at the optimal solution. These methods use a barrier function F K for the cone K that is strictly convex, smooth and F K (w) when w K, and for some µ > 0 proceed to compute w(µ) = arg min w c w + µf K (w) : Aw = b. (2.2) Letting µ 0, w(µ) converges to the solution (2.1). We can recover an approximate solution for (2.1) efficiently by using an approximate solution for w(µ k ) as a warm-start to approximate w(µ k+1 ), µ k > µ k+1.

36 Computational Methods for lasso: ipm Not all barriers are created equally, solving the problems w(µ) = arg min w c w + µf K (w) : Aw = b. (2.3) will be tractable only if F K is a special barrier (depends on K ). Three cones play a major role in optimization theory: Non-negative orthant: IR p + = {w IRp : w j 0, j = 1,...,p}, F K (w) = p j=1 log(w i) Second-order cone: Q n+1 = {(w, t) IR n IR + : t w }, F K (w) = log(t w ) Semi-definite positive matrices: S+ k k = {W IR k k : W = W, v Wv 0, for all v IR k }, F K (w) = log det(w)

37 Computational Methods for lasso: ipm These cones (and three others) are called self-scaled cones. Their cartesian products are also self-scaled. Therefore, when using ipms, it is VERY important computationally to arrive a formulation that is not only convex but also one that relies on self-scaled cones. It turns out that many tractable penalization methods (including l 1 but others as well) can be formulated with these three cones.

38 Computational Methods for lasso: ipm lasso min β Q(β) + λ β 1 to formulate the 1, let β = β + β, where β +,β IR p +. for any vector v IR n and any scalar t 0 we have that v v t is equivalent to (v,(t 1)/2) (t + 1)/2. }{{}}{{} =:u =:z min t,β +,β,u,z,v t n + λ p j=1 β + j + β j v = Y Xβ + + Xβ t = 1 + 2z t = 1 + 2u (v, u, z) Q n+2, t 0,β + IR p +, β IR p +.

39 Quick comparison on a very sparse model: s = 5, β 0j = 1 for j = 1,..., s, β 0j = 0 for j > s. Table: Average running time over 100 simulations. Different precisions across methods. n = 100, p = 500 Componentwise First-order Interior-point lasso lasso n = 200, p = 1000 Componentwise First-order Interior-point lasso lasso n = 400, p = 2000 Componentwise First-order Interior-point lasso lasso

40 Post-Model Selection Estimator Define the post-lasso estimator as follows: Step 1: Select the model using the lasso, T := support( β). Step 2: Apply ordinary least square (LS) to the selected model T. Remark: We could use lasso, Dantzig selector, sieve or l 1 -minimization instead of lasso.

41 Post-Model Selection Estimator Geometry β 2 β β 0 β 1 min β 1 : Q(β) γ, β R p In this example the correct model is selected, but note the shrinkage bias. The shrinkage bias can be removed by post model selection estimator, i.e. applying OLS to the selected model.

42 Post-Model Selection Estimator Geometry β 2 β β 0 β 1 min β 1 : Q(β) γ, β R p In this example the correct model was not selected. The selected model contains a junk regressor which can impact the post model selection estimator.

43 Issues on post model selection estimator lasso and others l 1 -penalized methods will successfully zero out lots of irrelevant regressors. however, these procedures are not perfect at model selection might include junk, exclude some relevant regressors, and these procedures bias/shrink the non-zero coefficient estimates towards zero. post model selection estimators are motivated to remove the bias towards zero but results need to account for possible misspecification in the selected model.

44 Regularity Condition A simple sufficient condition is as follows. Condition RSE. Take any C > 1. Then for all n large enough, the sc-sparse eigenvalues of the empirical design matrix E n [x i x i ], are bounded away from zero and from above; namely, 0 < κ min δ 0 sc E n [x i δ]2 δ 2 max δ 0 sc E n [x i δ]2 δ 2 κ <. (2.4) This holds under i.i.d. sampling if E[x i x i ] has eigenvalues bounded away from zero and above, and: x i has light tails (i.e., log-concave) and s log p = o(n); or bounded regressors max ij x ij K and s log 4 p = o(n).

45 Result 1: Rates for l 1 -penalized methods Theorem (Rates) Under condition RSE, wp 1 E n [x i β s log(n p) x i β 0] 2 σ n the rate is very close to the oracle rate s/n, obtainable when we know the true model T. p shows up only through a logarithmic term log p. for Dantzig selector: Candes and Tao (Annals of Statistics 2007) for lasso: Meinshausen and Yu (Annals of Statistics 2009), Bickel, Ritov, Tsybakov (Annals of Statistics 2009) for lasso: B., Chernozhukov and Wang (Biometrika, 2011)

46 Result 2: Post-Model Selection Estimator Theorem (Rate for Post-Model Selection Estimator) Under conditions of Result 1, wp 1 E n [x i β s PL x i β 0] 2 σ log(n p), n Under some further exceptional cases faster, up to σ Even though lasso does not in general perfectly select the relevant regressors, post-lasso performs at least as well. analysis account for a selected model that can be misspecified This result was first derived in the context of median regression B. and Chernozhukov (Annals of Statistics, 2011), and extended to least squares by B. and Chernozhukov (R & R Bernoulli, 2011). s n.

47 Monte Carlo In this simulation we used s = 6, p = 500, n = 100 y i = x i β 0 + ǫ i, ǫ i N(0,σ 2 ), β 0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0,...,0) x i N(0,Σ), Σ ij = (1/2) i j, σ 2 = 1 Ideal benchmark: Oracle estimator which runs OLS of y i on x i1,..., x i6. This estimator is not feasible outside Monte-Carlo.

48 Monte Carlo Results: Risk Risk: E[(x i β x β 0 ) 2 ] n = 100, p = 500 Estimation Risk LASSO Post-LASSO Oracle

49 Monte Carlo Results: Bias Bias: E[ β] β 0 2 n = 100, p = 500 Bias LASSO Post-LASSO

50 Applications of model selection

51 Gaussian Instrumental Variable Gaussian IV model y i = d i α + ǫ i, d i = D(z i ) + v i, (first stage) ( ǫi v i ) N ( ( σ 2 0, ǫ σ ǫv σ ǫv σv 2 )) can have additional controls w i in both equations; suppressed here, see references for details want to estimate optimal instrument D(z i ) = E[d i z i ] using a large list of technical instruments, x i := (x i1,..., x ip ) := (P 1 (z i ),..., P p (z i )), and we can write D(z i ) = x i β 0 + r i, with β 0 0 = s, E n [r 2 i ] σ 2 v s/n where p is large, possibly much larger than n. use estimated optimal instrument to estimate α.

52 2SLS with LASSO/Post-LASSO estimated Optimal IV 2SLS with LASSO/Post-LASSO estimated Optimal IV Step 1: Estimate D(z i ) = x i β using LASSO or Post-LASSO estimator. Step 2: α = [E n [d i D(zi ) ]] 1 E n [ D(z i )y i ]

53 2SLS with LASSO-based IV Theorem (2SLS with lasso-based IV) Under conditions of Result 1, and s 2 log 2 p = o(n), we have ( Λ ) 1/2 n( α α) d N(0, I), where Λ := σ 2 {E n [ D(z i ) D(z i ) ]} 1. For more results, see B., Chernozhukov, and Hansen (submitted, ArXiv 2010) The requirement s 2 log 2 p = o(n) can be relaxed to s log p = o(n) if split IV estimator is used. Extension to non-gaussian and heteroskedastic errors in B., Chen, Chernozhukov and Hansen (submitted, ArXiv 2010) required adjustments in the penalty function

54 Application of IV: Eminent Domain Estimate economic consequences of government take-over of property rights from individuals y i = economic outcome in a region i, e.g. housing price index d i = number of property take-overs decided in a court of law, by panels of 3 judges x i = demographic characteristics of judges, that are randomly assigned to panels: education, political affiliations, age, experience etc. f i = x i + various interactions of components of x i, a very large list p = p(f i ) = 344

55 Application of IV: Eminent Domain, Continued. Outcome is log of housing price index; endogenous variable is government take-over Can use 2 elementary instruments, suggested by real lawyers (Chen and Yeh, 2010) Can use all 344 instruments and select approximately the right set using LASSO. Estimator Instruments Price Effect Rob Std Error 2SLS SLS / LASSO IVs

56 l 1 -Penalized Quantile Regression a response variable y and p-dimensional covariates x u-th conditional quantile function of y given x is given by x β(u), β(u) R p. the dimension p is large, possibly much larger than the sample size n the true model β(u) is sparse having only s < n non-zero components; we don t know which. in population, β(u) minimizes Q u (β) = E[ρ u (y x β)], where ρ u (t) = (u 1{t 0})t = ut + + (1 u)t.

57 l 1 -Penalized Quantile Regression canonical estimator that minimizes Q u (β) = n 1 is not consistent when p n. n i=1 ρ u (y i x i β) l 1 -penalized quantile regression β(u) solves min Qu (β) + λ u β 1. β R p

58 l 1 -QR Rate Result Theorem (Rates) Under standard conditions, letting U (0, 1) be a compact set, wp 1 sup u U s log(n p) β(u) β(u) 2 P. n very similar to the uniform rate s log n/n when the support is known similar rates to lasso but also hold uniformly over u U although similar rates substantially different analysis

59 Application l 1 -QR: Conditional Independence Let X IR m denote a random vector. We are interest in determining conditional independence structure of X, for every pair of components i, j determine if X i X j X i, j. Conditional independence between two components of a Gaussian distributed random vector X N(0, Σ) has been well understood: For Σ = Var(X), X i and X j are independent conditioned on the other components if and only if Σ 1 ij = 0.

60 Application l 1 -QR: Conditional Independence Let X IR m denote a random vector. We are interest in determining conditional independence structure of X, for every pair of components i, j determine if X i X j X i, j. We consider a non-gaussian high-dimensional setting. if and only if X i X j X i, j Q Xi (u X i ) = Q Xi (u X i, j ) for all u (0, 1).

61 Application l 1 -QR: Conditional Independence In order to approximate conditional independence consider: we say that X i and X j are U-quantile uncorrelated if β i (u) arg min E[ρ u (X i β X i )] and β j (u) arg min E[ρ u (X j β X j )] are such that βj i(u) = 0 and βj i (u) = 0 for all u U.

62 Application l 1 -QR: Conditional Independence In order to approximate conditional independence consider: U (0, 1) a compact set of quantile indices a class of functions F We will say that if and only if for every u U X i U,F X j X i, j f i,j u (0, X i, j ) arg min f F E[ρ u(x i f(x j, X i, j ))] and f j,i u (0, X i, j ) arg min f F E[ρ u(x j f(x i, X i, j ))] This approach requires: series approximation for F; uniform estimation across all quantile indices u U; model selection.

63 Application l 1 -QR: Conditional Independence Let X be a vector of log returns. Edge (i, j) if X i X j X i, j FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

64 Application l 1 -QR: Conditional Independence Consider IBM returns first. FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

65 Application l 1 -QR: Conditional Independence Next consider all returns. FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

66 Application l 1 -QR: Conditional Independence Next we investigate how this dependence structure is affected by downside movement of the market. This is particulary relevant for hedging since during downside movements of the market is when hedge is most needed. We quantify downside movement by conditioning on the observations with the largest 25%, 50%, 75% and 100% downside movement of the market. (The 100% corresponds to the overall dependence.)

67 Application l 1 -QR: Conditional Independence 100% (No downside movement) FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

68 Application l 1 -QR: Conditional Independence 75% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

69 Application l 1 -QR: Conditional Independence 50% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

70 Application l 1 -QR: Conditional Independence 25% FUND KDE TW OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF PLXS LLTC DELL

71 Application l 1 -QR: Conditional Independence 100% vs. 25% FUND FUND KDE TW KDE TW OSIP HGR OSIP HGR DJCO RMCF IBM CAU EWST MSFT DGSE DJCO RMCF IBM CAU EWST MSFT DGSE SUNW ORCL BTFG AEPI JJSF SUNW ORCL BTFG AEPI JJSF PLXS LLTC PLXS LLTC DELL DELL The right graph is much sparser than the left graph. Thus, when markets are moving down several variables become (nearly) conditionally independent. Therefore, hedging strategies that do not accounting for downside movements might not offer the desired protection exactly when it is needed the most.

72 Variable Length Markov Chains Consider a dynamic model: dynamic discrete choice models discrete stochastic dynamic programming The main feature we are interest: history of outcomes potentially impacts the probability distribution of the next outcome

73 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Typical assumption: Markov Chain of order 1 (2 cond. prob.) t = 0 t = 1 L H

74 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order 2 requires to estimate 4 cond. prob. t = 0 t = 1 L H t = 2 LL LH HL HH

75 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order 3 requires to estimate 8 cond. prob. t = 0 t = 1 t = 2 LL L H LH HL HH t = 3 LLL LLH LHL LHH HLL HLH HHL HHH

76 Variable Length Markov Chains Consider at each time t agents choose: s t A = {Low, High}. Markov Chain of order K requires to estimate 2 K cond. prob. t = 0 t = 1 t = 2 t = 3 LLL LLH t = K.. LL L H LH LHL LHH.. HL HLL. HLH. HH HHL. HHH.

77 Variable Length Markov Chains Issues: K is unknown (potentially infinite) full K -order Markov chain becomes large quickly similar histories may lead to similar transition probabilities Motivation: bias (small model) vs. variance (large model) make the order of the Markov Chain depend on the history model selection balances: misspecification and length of confidence intervals

78 Variable Length Markov Chains Variable Length Markov Chain: very flexible but need to select the context tree t = 0 t = 1 t = 2 LL LH t = 3 LHL LHH. L H HL HH

79 Variable Length Markov Chains A penalty method based on (uniform) confidence intervals. Uniform estimation of transition probabilities over all possible histories x A 1 : sup P( x) P( x) x A 1 (bounds achieve the minimax rates for some classes) Results allow to establish: in dynamic discrete choice models: uniform estimation of dynamic marginal effects in discrete stochastic dynamic programming: uniform estimation of the value function

80 Conclusion

81 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n

82 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates

83 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates 3. post-l 1 -penalized estimators achieve at least near-oracle rates

84 Conclusion 1. l 1 and post-l 1 -penalization methods offer a good way to control for confounding factors; offer a good way to select the series terms; are especially useful when p n 2. l 1 -penalized estimators achieve near-oracle rates 3. post-l 1 -penalized estimators achieve at least near-oracle rates 4. applications of model selection in econometrics estimators

85 References Bickel, Ritov and Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, Candes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n, Annals of Statistics, Meinshausen and Yu, Lasso-type recovery of sparse representations for high-dimensional data, Annals of Statistics, Tibshirani, Regression shrinkage and selection via the Lasso, J. Roy. Statist. Soc. Ser. B, 1996.

86 References Belloni and Chernozhukov, l 1 -Penalized Quantile Regression for High Dimensional Sparse Models, 2007, Annals of Statistics. Belloni and Chernozhukov, Post-l 1 -Penalized Estimators in High-Dimensional Linear Regression Models, ArXiv, Belloni, Chernozhukov and Wang, LASSO: Pivotal Recovery of Sparse Signals via Conic Programming, Biometrika, Belloni and Chernozhukov, High Dimensional Sparse Econometric Models: An Introduction, Ill-Posed Inverse Problems and High-Dimensional Estimation, Springer Lecture Notes in Statistics, Belloni, Chernozhukov, and Hansen, Estimation and Inference Methods for High-dimensional Sparse Econometric Models: a Review, 2011, submitted. Belloni and Oliveira, Approximate Group Context Tree: application to dynamic programming and dynamic choice models, 2011, submitted.

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Econometrics of High-Dimensional Sparse Models

Econometrics of High-Dimensional Sparse Models Econometrics of High-Dimensional Sparse Models (p much larger than n) Victor Chernozhukov Christian Hansen NBER, July 2013 Outline Plan 1. High-Dimensional Sparse Framework The Framework Two Examples 2.

More information

Econometrics of Big Data: Large p Case

Econometrics of Big Data: Large p Case Econometrics of Big Data: Large p Case (p much larger than n) Victor Chernozhukov MIT, October 2014 VC Econometrics of Big Data: Large p Case 1 Outline Plan 1. High-Dimensional Sparse Framework The Framework

More information

High Dimensional Sparse Econometric Models: An Introduction

High Dimensional Sparse Econometric Models: An Introduction High Dimensional Sparse Econometric Models: An Introduction Alexandre Belloni and Victor Chernozhukov Abstract In this chapter we discuss conceptually high dimensional sparse econometric models as well

More information

arxiv: v1 [stat.me] 31 Dec 2011

arxiv: v1 [stat.me] 31 Dec 2011 INFERENCE FOR HIGH-DIMENSIONAL SPARSE ECONOMETRIC MODELS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN arxiv:1201.0220v1 [stat.me] 31 Dec 2011 Abstract. This article is about estimation and inference methods

More information

Mostly Dangerous Econometrics: How to do Model Selection with Inference in Mind

Mostly Dangerous Econometrics: How to do Model Selection with Inference in Mind Outline Introduction Analysis in Low Dimensional Settings Analysis in High-Dimensional Settings Bonus Track: Genaralizations Econometrics: How to do Model Selection with Inference in Mind June 25, 2015,

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN

SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN A. BELLONI, D. CHEN, V. CHERNOZHUKOV, AND C. HANSEN Abstract. We develop results for the use of LASSO and Post-LASSO

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS. 1. Introduction

LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS. 1. Introduction LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN Abstract. In this note, we propose to use sparse methods (e.g. LASSO, Post-LASSO, LASSO, and Post- LASSO)

More information

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo) Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

The risk of machine learning

The risk of machine learning / 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

HIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS

HIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS HIGH-DIMENSIONAL METHODS AND INFERENCE ON STRUCTURAL AND TREATMENT EFFECTS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN The goal of many empirical papers in economics is to provide an estimate of the causal

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

A Guide to Modern Econometric:

A Guide to Modern Econometric: A Guide to Modern Econometric: 4th edition Marno Verbeek Rotterdam School of Management, Erasmus University, Rotterdam B 379887 )WILEY A John Wiley & Sons, Ltd., Publication Contents Preface xiii 1 Introduction

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

arxiv: v5 [stat.me] 18 Dec 2011

arxiv: v5 [stat.me] 18 Dec 2011 SQUARE-ROOT LASSO: PIVOTAL RECOVERY OF SPARSE SIGNALS VIA CONIC PROGRAMMING arxiv:009.5689v5 [stat.me] 8 Dec 0 ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV AND LIE WANG Abstract. We propose a pivotal method

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT

INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT DAMIAN KOZBUR Abstract. This paper provides inference results for series estimators with a high dimensional component. In conditional

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Threshold Autoregressions and NonLinear Autoregressions

Threshold Autoregressions and NonLinear Autoregressions Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

The Instability of Correlations: Measurement and the Implications for Market Risk

The Instability of Correlations: Measurement and the Implications for Market Risk The Instability of Correlations: Measurement and the Implications for Market Risk Prof. Massimo Guidolin 20254 Advanced Quantitative Methods for Asset Pricing and Structuring Winter/Spring 2018 Threshold

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Data with a large number of variables relative to the sample size highdimensional

Data with a large number of variables relative to the sample size highdimensional Journal of Economic Perspectives Volume 28, Number 2 Spring 2014 Pages 1 23 High-Dimensional Methods and Inference on Structural and Treatment Effects Alexandre Belloni, Victor Chernozhukov, and Christian

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E. Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Best subset selection via bi-objective mixed integer linear programming

Best subset selection via bi-objective mixed integer linear programming Best subset selection via bi-objective mixed integer linear programming Hadi Charkhgard a,, Ali Eshragh b a Department of Industrial and Management Systems Engineering, University of South Florida, Tampa,

More information

Heriot-Watt University. Two-Step Lasso Estimation of the Spatial Weights Matrix Ahrens, Achim; Bhattacharjee, Arnab. Heriot-Watt University

Heriot-Watt University. Two-Step Lasso Estimation of the Spatial Weights Matrix Ahrens, Achim; Bhattacharjee, Arnab. Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway wo-step Lasso Estimation of the Spatial Weights Matrix Ahrens, Achim; Bhattacharjee, Arnab Published in: Econometrics DOI: 0.3390/econometrics3008

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Problem Set 7. Ideally, these would be the same observations left out when you

Problem Set 7. Ideally, these would be the same observations left out when you Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations

More information

Variable selection and estimation in high-dimensional models

Variable selection and estimation in high-dimensional models Variable selection and estimation in high-dimensional models Joel L. Horowitz Department of Economics, Northwestern University Abstract. Models with high-dimensional covariates arise frequently in economics

More information

Lecture 9: Quantile Methods 2

Lecture 9: Quantile Methods 2 Lecture 9: Quantile Methods 2 1. Equivariance. 2. GMM for Quantiles. 3. Endogenous Models 4. Empirical Examples 1 1. Equivariance to Monotone Transformations. Theorem (Equivariance of Quantiles under Monotone

More information

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary. Econometrics I Department of Economics Universidad Carlos III de Madrid Master in Industrial Economics and Markets Outline Motivation 1 Motivation 2 3 4 5 Motivation Hansen's contributions GMM was developed

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

PhD/MA Econometrics Examination January 2012 PART A

PhD/MA Econometrics Examination January 2012 PART A PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Topic 4: Model Specifications

Topic 4: Model Specifications Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information