Estimation Lasso et interactions poissoniennes sur le cercle

Similar documents
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Compressed Sensing and Sparse Recovery

Model Selection and Geometry

Adaptive Lasso and group-lasso for functional Poisson regression

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

STAT 200C: High-dimensional Statistics

Strengthened Sobolev inequalities for a random subspace of functions

High-dimensional covariance estimation based on Gaussian graphical models

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Subset selection with sparse matrices

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Compressed Sensing: Lecture I. Ronald DeVore

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Supremum of simple stochastic processes

Estimation of large dimensional sparse covariance matrices

Recovering overcomplete sparse representations from structured sensing

Least squares under convex constraint

Sensing systems limited by constraints: physical size, time, cost, energy

sparse and low-rank tensor recovery Cubic-Sketching

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Reconstruction from Anisotropic Random Measurements

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Adaptive Lasso and group-lasso for functional Poisson regression

Séminaire INRA de Toulouse

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals

Limitations in Approximating RIP

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Algorithms for phase retrieval problems

Stability and Robustness of Weak Orthogonal Matching Pursuits

Recent Developments in Compressed Sensing

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Optimisation Combinatoire et Convexe.

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Graphlet Screening (GS)

D I S C U S S I O N P A P E R

Lecture 16: Sample quantiles and their asymptotic properties

Greedy Signal Recovery and Uniform Uncertainty Principles

Translation Invariant Experiments with Independent Increments

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Fast and Robust Phase Retrieval

Physics-based Prior modeling in Inverse Problems

Patricia Reynaud-Bouret 1

of Orthogonal Matching Pursuit

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

High-dimensional statistics: Some progress and challenges ahead

Classical and Bayesian inference

19.1 Maximum Likelihood estimator and risk upper bound

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Traveling waves dans le modèle de Kuramoto quenched

Tractable Upper Bounds on the Restricted Isometry Constant

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Statistics for high-dimensional data

Estimating Unknown Sparsity in Compressed Sensing

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

11 : Gaussian Graphic Models and Ising Models

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Homogeneity Pursuit. Jianqing Fan

Compressed Sensing and Neural Networks

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Estimating LASSO Risk and Noise Level

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

Estimating Unknown Sparsity in Compressed Sensing

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Inverse problems in statistics

Statistical Learning with Hawkes Processes. 5th NUS-USPC Workshop

Compressive Sensing with Random Matrices

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Sparse Recovery from Inaccurate Saturated Measurements

AN INTRODUCTION TO COMPRESSIVE SENSING

Symmetry breaking in Caffarelli-Kohn-Nirenberg inequalities

CSC 576: Variants of Sparse Learning

Inverse problems in statistics

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

A REMARK ON THE LASSO AND THE DANTZIG SELECTOR

A lava attack on the recovery of sums of dense and sparse signals

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

6 Compressed Sensing and Sparse Recovery

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

The lasso, persistence, and cross-validation

A lava attack on the recovery of sums of dense and sparse signals

Statistics for scientists and engineers

Numerische Mathematik

arxiv:math/ v3 [math.st] 1 Apr 2009

Generalized Elastic Net Regression

On quasi-richards equation and finite volume approximation of two-phase flow with unlimited air mobility

Transcription:

Estimation Lasso et interactions poissoniennes sur le cercle Laure SANSONNET Université catholique de Louvain, Belgique J-3 : AgroParisTech en collaboration avec : Patricia Reynaud-Bouret (Université Nice Sophia-Antipolis), Vincent Rivoirard (Université Paris Dauphine) et Rebecca Willett (University of Wisconsin-Madison). Vendredi 29 août 2014 Journées MAS2014 1 / 18

Motivation Estimation in Hawkes model (Reynaud-Bouret and Schbath 2010, Hansen et al. 2012,... ) and for Poissonian interactions (Sansonnet 2014) needs the knowledge of an a priori bound on the support of interaction fonctions. 2 / 18

Motivation Estimation in Hawkes model (Reynaud-Bouret and Schbath 2010, Hansen et al. 2012,... ) and for Poissonian interactions (Sansonnet 2014) needs the knowledge of an a priori bound on the support of interaction fonctions. Aim: Overcome knowledge of this bound in a high-dimensional discrete setting. A discrete version of the Poissonian interactions model that is in the heart of my PhD thesis. A circular model: to avoid boundary effects and also to reflect a certain biological reality. 2 / 18

Poissonian discrete model Parents: n i.i.d. uniform random variables U 1,..., U n on the set {0,..., p 1} points on a circle (we work modulus p). 3 / 18

Poissonian discrete model Parents: n i.i.d. uniform random variables U 1,..., U n on the set {0,..., p 1} points on a circle (we work modulus p). Children: Each U i gives birth independently to some Poisson variables. If x = (x0,..., xp 1) H R p +, then NU i i +j P(xj ) independent of anything else. The variable NU i i +j represents the number of children that a certain individual U i has at distance j. We set Y k = n i=1 Ni k the total number of children at position k whose distribution conditioned on the U i s is given by ( n ) Y k P xk U i. i=1 3 / 18

Poissonian discrete model Parents: n i.i.d. uniform random variables U 1,..., U n on the set {0,..., p 1} points on a circle (we work modulus p). Children: Each U i gives birth independently to some Poisson variables. If x = (x0,..., xp 1) H R p +, then NU i i +j P(xj ) independent of anything else. The variable NU i i +j represents the number of children that a certain individual U i has at distance j. We set Y k = n i=1 Ni k the total number of children at position k whose distribution conditioned on the U i s is given by ( n ) Y k P xk U i. Aim: Estimate x with the observation of the U i s and the Y k s. i=1 3 / 18

Another translation Y = (Y 0,..., Y p 1 ) H P(Ax ), 4 / 18

Another translation Y = (Y 0,..., Y p 1 ) H P(Ax ), where N(0) N(p 1) N(1). A = N(1) N(0)........... N(p 1) N(p 1) N(1) N(0) is a p p circulant matrix, with N(k) = card {i : U i = k[p]} representing the number of parents at position k on the circle. 4 / 18

Another translation Y = (Y 0,..., Y p 1 ) H P(Ax ), where N(0) N(p 1) N(1). A = N(1) N(0)........... N(p 1) N(p 1) N(1) N(0) is a p p circulant matrix, with N(k) = card {i : U i = k[p]} representing the number of parents at position k on the circle. Aim: Recover x solve an inverse problem (potentially ill-posed) where the operator A is random and depends on the U i s. 4 / 18

RIP property I The eigenvalues of A are given by σ k = n i=1 e 2πîkU i /p, for k in {0,..., p 1}. In particular, E(σ k ) = { n if k = 0[p] 0 if k 0[p]. 5 / 18

RIP property I The eigenvalues of A are given by σ k = n i=1 e 2πîkU i /p, for k in {0,..., p 1}. In particular, E(σ k ) = { n if k = 0[p] 0 if k 0[p]. So, with high probability, many eigenvalues of A may be close to zero. For these reasons, our problem is potentially ill-posed, which justifies the use of nonparametric procedures, such as the Lasso, even if p is not large with respect to n. 5 / 18

RIP property I The eigenvalues of A are given by σ k = n i=1 e 2πîkU i /p, for k in {0,..., p 1}. In particular, E(σ k ) = { n if k = 0[p] 0 if k 0[p]. So, with high probability, many eigenvalues of A may be close to zero. For these reasons, our problem is potentially ill-posed, which justifies the use of nonparametric procedures, such as the Lasso, even if p is not large with respect to n. We first focus on proving a Restricted Isometry Property (Candès and Tao, 2005): there exist positive constants r and R such that with high probability, for any K-sparse vector x (i.e. with at the most K non-zero coordinates) with R as close to r as possible. r x 2 Ax 2 R x 2, 5 / 18

RIP property II Under conditions, a RIP is satisfied by à = A n n p 11 H : n 2 x 2 2 Ãx 2 2 3n 2 x 2 2. 6 / 18

RIP property II Under conditions, a RIP is satisfied by à = A n n p 11 H : n 2 x 2 2 Ãx 2 2 3n 2 x 2 2. We obtain also a classical Restricted Eigenvalue (RE) type condition (see Bickel 2009) on an event of probability larger than 1 5.54 pe θ s.t. for all d {0,..., p 1}, ( ) n U(d) κ p θ + θ 2 =: nξ(θ), with U(d) = p 1 n n u=0 i=1 j i,j=1 ( 1 Ui =u 1 ) ( 1 Uj =u+d[p] 1 ) and for p p p and n be fixed integers larger than 1 and for all θ > 1. 6 / 18

Lasso estimator by using a weighted penalty I Ã = A n n p 11 H and Ỹk = Y k n n p Y, with Y = 1 p 1 n k=0 Y k. E(Y ) = x 1 and conditionally on the U i s, Ỹ is an unbiased estimate of Ãx : E(Ỹ U 1,..., U n ) = Ãx. 7 / 18

Lasso estimator by using a weighted penalty I Ã = A n n p 11 H and Ỹk = Y k n n p Y, with Y = 1 p 1 n k=0 Y k. E(Y ) = x 1 and conditionally on the U i s, Ỹ is an unbiased estimate of Ãx : E(Ỹ U 1,..., U n ) = Ãx. We first introduce the following Lasso estimate for some γ > 0: { } x := argmin Ỹ p 1 Ãx 2 2 + γ d k x k, x R p k=0 which is based on random and data-dependent weights d k. 7 / 18

Lasso estimator by using a weighted penalty II The Lasso estimate satisfies { (ÃH (Ỹ Ã x)) k = γd k 2 sign( x k) for k s.t. x k 0 ÃH (Ỹ Ã x) γd k k 2 for k s.t. x k = 0, and in particular, for any k, ÃH (Ỹ Ã x) k γd k 2. 8 / 18

Lasso estimator by using a weighted penalty II The Lasso estimate satisfies { (ÃH (Ỹ Ã x)) k = γd k 2 sign( x k) for k s.t. x k 0 ÃH (Ỹ Ã x) γd k k 2 for k s.t. x k = 0, and in particular, for any k, ÃH (Ỹ Ã x) k γd k 2. Our approach where weights are random is similar to Zou 2006, Bertin et al. 2011, Hansen et al. 2012,... : in some sense, the weights play the same role as the thresholds in the estimation procedure proposed in Donoho and Johnstone 1994, Reynaud-Bouret and Rivoirard 2010, Sansonnet 2014,... 8 / 18

Lasso estimator by using a weighted penalty III The double role of the weights: They have to control the random fluctuations of ÃHỸ around its mean conditionally on the U i s due to the Poisson setting: 9 / 18

Lasso estimator by using a weighted penalty III The double role of the weights: They have to control the random fluctuations of ÃHỸ around its mean conditionally on the U i s due to the Poisson setting: ÃH (Ỹ Ãx ) k d k. Remark: If γ 2, x will also satisfy ÃH (Ỹ Ã x ) k γd k 2. This is a key technical point to prove optimality of our approach. 9 / 18

Lasso estimator by using a weighted penalty III The double role of the weights: They have to control the random fluctuations of ÃHỸ around its mean conditionally on the U i s due to the Poisson setting: ÃH (Ỹ Ãx ) k d k. Remark: If γ 2, x will also satisfy ÃH (Ỹ Ã x ) k γd k 2. This is a key technical point to prove optimality of our approach. They need to be high enough, so that they work even if A is not invertible enough. 9 / 18

Proposition 1 How to choose the d k s? For any θ > 0, there exists an event Ω v (θ) of probability larger than 1 2pe θ on which, for all k in {0,..., p 1}, where and u=0 ÃH (Ỹ Ãx ) k 2v k θ + Bθ 3, B = max u {0,...,p 1} N(u) n 1 p p 1 v k = w(k u)xu, u=0 p 1 ( with w(d) = N(u) n 1 ) 2 N(u + d) for all d in {0,..., p 1}. p 10 / 18

Derivation of constant weights where W := max w(d) = d observable, and v k W x 1, max p 1 u {0,...,p 1} l=0 x 1 is unbiasedly estimated by Ȳ. ( N(l + u) n 1 ) 2 N(l) is p 11 / 18

Derivation of constant weights where W := max w(d) = d observable, and v k W x 1, max p 1 u {0,...,p 1} l=0 x 1 is unbiasedly estimated by Ȳ. Constant weights d := 2W θ [ Ȳ + 5θ 6n + ( N(l + u) n 1 ) 2 N(l) is p ] θ + Bθ 2n 3 satisfies ÃH (Ỹ Ãx ) k d k = d on an event of probability larger than 1 (2p + 1)e θ. 11 / 18

Derivation of non constant weights We reestimate the v k s: p 1 ( ˆv k = N(l k) n 1 ) 2 Y l. p l=0 12 / 18

Derivation of non constant weights We reestimate the v k s: p 1 ( ˆv k = N(l k) n 1 ) 2 Y l. p l=0 Non constant weights d k = [ 2θ ˆv k + 5θB2 6 + ] θb 2 2 + Bθ 3 satisfies ÃH (Ỹ Ãx ) k d k = d k on an event of probability larger than 1 3pe θ. 12 / 18

Control of the weights I We consider the following special relationships between n, p and θ. For any integers n, p and for any θ > 1, κ pθ n κ pθ 1, where κ := max(2κ, 1) and κ is an absolute constant small enough. In particular, if we choose θ proportional to log p (which is natural to have results on events with large probability), then the regime becomes p log(p) << n << p log 1 (p). 13 / 18

Control of the weights II Theorem On a event of probability larger than 1 (2 + 8.54p)e θ c ( nθ x l1 + θ 2) d 2 c ( nθ x l1 + θ 4). On a event of probability larger than 1 10.54pe θ, for any k {1,..., p}, c nθxk + n 2 θp 1 xu + θ 2 u k d k 2 c nθx k + n 2 θ 2 p 1 xu + θ 4. u k 14 / 18

Control of the weights II For instance, in the asymptotic regime nθ = o(p) and θ 2 = o(n 2 x 1 /p), then if xk = 0, d k 2 = O(n 2 θ 2 p 1 x 1 ) = o(nθ x 1 ). Therefore d 2 k = o(d 2 ), i.e. d k can be much more smaller than d. 15 / 18

Oracle inequalities for the lasso estimate with d Theorem Let γ > 2 and 0 < ε < 1. Let s a positive integer satisfying 3γ + 2 sξ(θ) 1 ε. γ 2 Then, there exists a constant C γ,ε depending on γ and ε such that on an event of probability larger than 1 (1 + 7.54p)e θ, the lasso estimate x satisfies { Ãx } à x Ãx 2 2 C γ,ε inf Ãx 22 + sd(θ)2. x: supp(x) s n And if x is s-sparse, under the same assumptions, we get that à x Ãx 2 2 C γ,εsd(θ) 2. n Remark : We also obtain oracle inequalities for the l 1 and l -losses. 16 / 18

Oracle inequalities for the lasso estimate with d k Theorem Let γ > 2 and 0 < ε < 1. Let s be a positive integer and assume that we are on an event s.t. ( max 1 k p dk (θ) (1 ε)s 1 min 1 k p dk (θ) ξ(θ) 1 1 ) (γ 2). 2(γ + 2) Then, there exists a constant C γ,ε depending on γ and ε such that on an event of probability larger than 1 8.54pe θ, the lasso estimate x satisfies à x Ãx 2 2 C γ,ε inf x: supp(x) s Ãx Ãx 2 2 + 1 d k 2 (θ) n. k supp(x) And if x is s-sparse, under the same assumptions, we get that à x Ãx 2 2 C γ,ε n k supp(x ) d 2 k (θ). 17 / 18

And what happens next? We will consider a second alternative consisting in assuming that x is supported by S with S = o(p). We then introduce the pseudo-estimate x (S ) defined by { } x (S ) argmin x R p :supp(x) S Ỹ p 1 Ãx 2 2 + γ d k x k In this case, we shall see that under some conditions the support of ˆx is included into S (support property), enabling us to derive oracle inequalities quite easily. k=0. 18 / 18

And what happens next? We will consider a second alternative consisting in assuming that x is supported by S with S = o(p). We then introduce the pseudo-estimate x (S ) defined by { } x (S ) argmin x R p :supp(x) S Ỹ p 1 Ãx 2 2 + γ d k x k In this case, we shall see that under some conditions the support of ˆx is included into S (support property), enabling us to derive oracle inequalities quite easily. Simulations... k=0. 18 / 18

And what happens next? We will consider a second alternative consisting in assuming that x is supported by S with S = o(p). We then introduce the pseudo-estimate x (S ) defined by { } x (S ) argmin x R p :supp(x) S Ỹ p 1 Ãx 2 2 + γ d k x k In this case, we shall see that under some conditions the support of ˆx is included into S (support property), enabling us to derive oracle inequalities quite easily. Simulations... Merci pour votre attention! k=0. 18 / 18