Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

When does OMP achieve exact recovery with continuous dictionaries?

When does OMP achieve exact recovery with continuous dictionaries? This paper presents new theoretical results on sparse recovery guarantees for a greedy algorithm, Orthogonal Matching Pursuit (OMP), in the context of continuous parametric dictionaries. Here, the continuous setting means that the dictionary is made up of an in nite uncountable number of atoms. In this work, we rely on the Hilbert structure of the observation space to express our recovery results as a property of the kernel de ned by the inner product between two atoms. Using a continuous extension of Tropp's Exact Recovery Condition, we identify key assumptions allowing to analyze OMP in the continuous setting. Under these assumptions, OMP unambiguously identi es in exactly k steps the atom parameters from any observed linear combination of k atoms. These parameters play the role of the so-called support of a sparse representation in traditional sparse recovery. In our paper, any kernel and set of parameters that satisfy these conditions are said to be admissible. In the one-dimensional setting, we exhibit a family of kernels relying on completely monotone functions for which admissibility holds for any set of atom parameters. For higher dimensional parameter spaces, the analysis turns out to be more subtle. An additional assumption, so-called axis admissibility, is imposed to ensure a form of delayed recovery (in at most k steps, where D is the dimension of the parameter space). Furthermore, guarantees for recovery in exactly k steps are derived under an additional algebraic condition involv- ing a nite subset of atoms (built as an extension of the set of atoms to be recovered). We show that the latter technical conditions simplify in the case of Laplacian kernels, allowing us to derive simple conditions for k-step exact recovery, and to carry out a coherence-based analysis in terms of a minimum separation assumption between the atoms to be recovered. Keywords: sparse representation, continuous dictionaries, Orthogonal Matching Pursuit, exact recovery Preprint submitted to Elsevier June 23, 2020 arXiv:1904.06311v3 [cs.IT] 22 Jun 2020 1. Introduction Finding a sparse signal representation is a fundamental problem in signal processing. It consists in decomposing a signal y belonging to some vector space H as the linear combination of a few elements of some set A  H, that is y = c a where c 2 R , a 2 A. (1.1) ` ` ` ` `=1 Sparsity refers to the fact that the number of elements involved in the decom- position (1.1) should be much smaller than the ambient dimension, i.e., the dimension of H. The set A is commonly referred to as a dictionary and its ele- ments as atoms. In the sequel, we will assume that A is a parametric dictionary de ned as: A = fa() :  2 g (1.2) where  = R and a :  ! H is some continuous and injective function. In ? k this setup, (1.1) implies that there exist k parameters f g such that y can ` `=1 ? k be expressed as a linear combination of the atoms fa( )g . ` `=1 Over the past decade, sparse representations have proven to be of great interest in many applicative domains. As a consequence, numerous practical procedures, along with their theoretical analyses, have been proposed in the literature. Most contributions addressed the sparse-representation problem in the \discrete " setting, where the dictionary contains a nite number of elements, see [1]. Recently, several works tackled the problem of sparse representations in \continuous " dictionaries, whereA is made up of an in nite uncountable number of atoms but a :  ! H enjoys some continuity property, see e.g., [2{4]. We review the contributions most related to the present work in Section 2. Before dwelling over the state of the art, we brie y describe the scope of our paper. In this work, we focus on the continuous setting and assume that H is a Hilbert space with inner product h;i and induced norm kk. We de- rive exact recovery conditions for \Orthogonal Matching Pursuit" (OMP) [5], a natural adaptation to the continuous setting of a popular greedy procedure of the literature (see Algorithm 1). The main question addressed in this paper is ? k as follows. Let f g be k pairwise distinct elements of  and assume that ` `=1 ? k y obeys (1.1) with a = a( ) for some fc g  R . Under which conditions ` ` `=1 does OMP achieve exact recovery (that is, correct unambiguous identi cation) ? k k of the parameters f g and the coecients fc g ? In particular, is exact ` `=1 `=1 recovery possible in k steps? This is of course only possible if the preimage of an atom a = a( ) is unique, hence the assumption that a() is injective. ` ` We note that, in the context of continuous dictionaries, the fact that OMP could correctly identify a set of k atoms in exactly k iterations may seem sur- prising in itself. Indeed, inspecting Algorithm 1, we see that this implies that OMP must identify one correct atom at each iteration t of the algorithm, that ? k is  2 f g 8t 2 J1; kK. The following simple example suggests that such a ` `=1 requirement may never be met for continuous dictionaries: 2 Algorithm 1: Orthogonal Matching Pursuit (OMP) Input: observation y 2 H, normalized dictionary A = fa() :  2 g. 1 r y // residual vector 2 S ; // estimated support 3 t 0 ; 4 while r 6= 0 do 5 t t + 1 ; 6  2 arg max jha();rij // atom selection b b b 7 S S [f g // support update 8 (bc ; : : : ;bc ) arg min y c a  // least-squares update 1 t ` ` (c ;:::;c )2R 1 t `=1 9 r y bc a  // residual vector ` ` `=1 10 end 11 k = t ; b b k k Output: estimated support S = f g and coecients fbc g . ` ` `=1 `=1 Example 1 (The Gaussian deconvolution problem). Consider  = R and let H = L (R) be the space of square integrable functions on R. Assume a() is de ned as a : R ! L (R) (1.3) 1 1 2 () 4 2 7!  e : Suppose y results from the positive linear combination of k = 2 distinct ? ? ? ? atoms, that is y = c a( ) + c a( ),  6=  , c > 0, c > 0. Then, even 1 2 1 2 1 2 1 2 ? ? in this very simple case, OMP never selects an atom in f ;  g at the rst 1 2 iteration. Indeed, particularizing step 6 of Algorithm 1 to the present setup, we have that, at the rst iteration, OMP will select the parameter  maximizing ? 2 ? 2 1 1 ( ) ( ) 1 2 4 4 jha();yij = c e + c e : (1.4) 1 2 Now, since the right-hand side of (1.4) is continuously di erentiable, rst-order optimality conditions tell us that any maximizer of  7! jha();yij must satisfy 1 ? 2 1 ? 2 ? ( ) ? ( ) 4 1 4 2 (  )c e + (  )c e = 0: (1.5) 1 2 1 2 ? ? ? Since  6=  , c 6= 0, c 6= 0, this equality cannot be veri ed by either  = 1 2 1 2 1 ? ? ? or  =  . As a consequence, OMP necessarily selects some  2= f ;  g. 2 1 2 Nevertheless, we show in this paper that exact recovery in k steps is pos- sible with OMP for some particular families of dictionaries A. Our recovery 3 0 conditions are expressed in terms of the kernel function (;  ) associated to the inner product between two atoms, i.e., 0 0 (;  ) , ha();a( )i: (1.6) 0 ? k We show that if the kernel  ;  and the atom parameters f g verify ` `=1 some particular conditions (see Section 3.2), then exact recovery in k steps is possible with OMP. We emphasize moreover that these conditions are satis ed for a family of kernels of the form: 0 0 (;  ) = ' k  k 0 < p  1; (1.7) where kk is the ` quasi-norm (norm for p = 1) and ' is a completely monotone function (see De nition 3). This family encompasses the well-known Laplace kernel [6]. Hereafter, we will refer to kernels taking the form (1.7) as \CMF kernels". A rst (perhaps surprising) outcome of our analysis is as follows. If  = R and the dictionary is de ned by a CMF kernel (1.7), OMP correctly identi es ? k k any pairwise distinct atom parameters f g   and coecients fc g ` `=1 `=1 R in exactly k iterations for any k 2 N (see Theorem 3). We emphasize that no ? k separation (i.e., minimal distance between parameters f g ) is needed. To ` `=1 our knowledge, this is the rst recovery of this kind in continuous dictionaries when no sign constraint is imposed on the coecients. It turns out that this \universal" exact recovery result is valid for very particular families of dictio- naries: CMF kernels exhibit a discontinuity in their derivatives (e.g., the partial derivative of  with respect to  when  =  ) and the space H in which the cor- responding dictionary lives is necessarily in nite-dimensional (see Section 3.3). When  = R with D > 1 and the dictionary is de ned by a CMF kernel (1.7), we show that such an exact recovery result no longer holds (see Exam- ple 4). Nevertheless, for dictionaries based on CMF kernels, under an additional hypothesis (referred to as \axis admissibility ", see De nition 7), we demonstrate that a form of delayed exact recovery (that is, in more than k iterations) holds. The number of iterations sucient to identify a set of k parameters is then upper-bounded by k (see Theorem 4). Moreover, under the above-mentioned hypothesis of axis admissibility, sucient and necessary conditions for exact 1 ? k recovery of a given subset f g in k steps (irrespective of the choice of the ` `=1 coecients fc g ) can be written in terms of a nite number of atoms of the `=1 D ? k dictionary (smaller than k ) including f g (see Theorem 4). We leverage ` `=1 this result to prove that exact recovery in k steps is possible as soon as the ? k elements of the subset f g obey some \minimum separation" condition (see ` `=1 Theorem 5). The rest of this paper is organized as follows. Section 2 draws connections with the sparse recovery literature. Section 3.1 elaborates on the main ingre- dients of the \continuous" setup and de nes the notions of recovery that are 1 ? k Here and in the sequel, when referring to a subset f g , we implicitly assume that the ` `=1 elements  are pairwise distinct. 4 used in the statements of our results. In Section 3.2, we exhibit a sucient condition on atom parameters and kernel such that exact recovery of a given set of atom parameters holds. We then present the family of CMF dictionaries in Section 3.3 and show in Section 3.4 that di erent forms of recovery can be achieved in these dictionaries. Concluding remarks are given in Section 4. The technical details of our results are contained in the appendices of the paper. The proofs of our main recovery results are exposed in Appendices A and B. Ap- pendix C contains some auxiliary technical details. Finally, Appendices D and E are dedicated to some mathematical developments related to two examples discussed in the paper. Notations The following notations will be used in this paper. The symbols R;R ;R ;R refer to the set of real, non-zero, non-negative and positive numbers, respec- tively. Boldface lower and upper cases (e.g., g, G) are used to denote ( nite- dimensional) vectors and matrices, respectively. The notation [i] refers to the ith element of a vector, and [i; j] for the element at the i-th row and j-th column of a matrix. Italic boldface letters (e.g., y or a) denote elements of a Hilbert space H. All-one and all-zero column vectors in R are denoted 1 and 0 , re- k k spectively. The `-th vector of the canonical basis in R will be denoted e . The notations h;i and kk refer to the inner product and its induced norm on H, while kk with p > 0 refers to the classical ` (pseudo-) norm on R . Finally, calligraphic letters (e.g., S;G) are used to describe nite subsets of the param- eter space , while Jm; nK denotes the set of integers i such that m  i  n. Given S  , we let S , nS be the complementary set of S in . The cardinality of a set is denoted card(). Finally, if ' : R 7! R is a function, (n) the notation ' refers to its n-th derivative. The main notations used in this paper are summarized in Appendix F. 2. Related works and state of the art Over the last decade, sparse representations have sparked a surge of in- terest in the signal processing, statistics, and machine learning communities. A question of broad interest which has been addressed by many scientists is the identi cation of the \sparsest" representation of an input signal y (that is, the representation involving the smallest number of elements of A). Since this problem has been shown to be NP-hard [7], many sub-optimal procedures have been proposed to approximate its solution. Among the most popular, one can mention methodologies based on convex relaxation and greedy algorithms. The term \sub-optimal" has to be understood in the following sense: these procedures are heuristics that only nd the sparsest solution of the input vector y under some restricted conditions. They can fail when these conditions do not hold. 5 Greedy procedures have a long history in the signal processing and statistical literature, which can be traced back to (at least) the 60's [8]. In the signal processing community, the most popular instances of greedy algorithms are known under the names of Matching Pursuit (MP) [9], Orthogonal Matching Pursuit (OMP) [5] (also known as Orthogonal Greedy Algorithm (OGA) [10, 11]) and Orthogonal Least Squares (OLS) [12]. Although these algorithms were already known under di erent names in other communities [13], they have been \rediscovered" many times, see e.g., [14{16]. Extensions to more general cost functions and kernel dictionaries are discussed in [17]. Sparse representations based on the resolution of convex optimization prob- lems were initially proposed in geophysics [18] for seismic exploration. These methods have been popularized in the signal processing community by the sem- inal work by Chen et al. [19] and by Tibshirani in Statistics [20]. Well-known instances of convex-relaxation approaches for sparse representations are Basis Pursuit (BP) [19] and Lasso [20], also known as Basis Pursuit Denoising, which correspond to di erent convex optimization formulations. Many algorithmic solutions to eciently address these problems have been proposed, see e.g., [21{23]. All the early contributions mentioned above have been made in the discrete setting, where the dictionary contains a nite number of atoms. Although Mallat and Zhang [9] already de ned MP for continuous dictionaries, the wide practice of MP is in the discrete setting. Greedy sparse approximation in the context of dictionaries made up of an in nite (possibly uncountable) number of atoms has only been studied more recently [16, 24, 25]. Practical procedures to implement greedy procedures in continuous dictionaries can be found in [26{28]. On the side of convex relaxation approaches, it was shown that a continu- ous version of Lasso can be expressed as a convex optimization problem over the space of Radon measures [29] and later referred to as the Beurling Lasso (BLasso) [30]. A continuous version of BP was also proposed [3] for speci c continuous dictionaries by exploiting similar ingredients. Motivated by an in- creasing demand in ecient solvers, di erent strategies to nd the solution of this problem (to some accuracy) were proposed over the past few years. When dealing with dictionaries made up of complex exponentials that depend on a one-dimensional parameter (that is D = 1), the Blasso problem can be reformu- lated as a semide nite program (SDP) [3, 31]. These methods have been further extended to the multidimensional case by considering SDP approximations of the problem [32]. The conditional gradient method (CGM) has also proven to be applicable to address the BLasso problem [29] and further enhanced with non- convex local optimization extra steps [33{35]. Interestingly, the CGM has been shown to be equivalent to the so-called exchange method in [36, 37]. More re- cently, gradient- ow methods on spaces of measures have also been investigated to address the BLasso problem [38, 39]. Finally, we also mention the existence of a vast literature on non-convex and non-variational procedures leveraging the celebrated Prony's method [40]. Among others, one may cite its extension to the multivariate case [41], the MUSIC [42] and ESPRIT [43] frameworks, as well as nite rate of innovation methods [44]. 6 Because (most of ) the approaches mentioned above (both in the discrete and continuous settings) are heuristics looking for the sparsest representation of some y, many theoretical works have been carried out to analyze their per- formance. Hereafter, we review the contributions of the literature most related to the present work. In particular, we focus on the contributions dealing with ? k k exact recovery of some subset f g for any choice of the coecients fc g ` `=1 `=1 (sometimes assuming some speci c sign patterns). In our discussion, we will use ? ? k the short-hand notation S = f g and refer to the latter as \support". Since ` `=1 ? k we always implicitly assume that the parameters f g are pairwise distinct, ` `=1 we have card(S ) = k. The presentation is organized in two parts, dealing re- spectively with the discrete and the continuous cases. In the discrete setting, we restrict our attention to contributions addressing the performance of MP, OMP and OLS, i.e., the greedy procedures the most connected to the framework of this paper. In the continuous setting, recovery analysis, including stability and robustness to noise, have only been addressed for convex-relaxation approaches. We review these conditions below and draw some similarities and di erences with the guarantees derived for OMP. 2.1. Discrete setting The discrete setting refers to the case where the dictionary contains a nite number of elements, that is card(A) < 1. Hereafter, we will restrict our discus- sion to parametric dictionaries of the form (1.2) since they are the main focus of this paper. In this context, the discrete setting refers to card() < +1. Exact Recovery Condition. The rst thorough analysis of OMP exact \k-step" ? ? k recovery of some S , f g is due to Tropp in [45]. Introducing the notations ` `=1 0 ? ? G[`; ` ] , ( ;  ); ` ` (2.1) g [`] , (;  ); Tropp's result can be rephrased as follows: ? ? k Theorem 1 (Tropp's ERC). Consider S = f g and assume that the atoms ` `=1 ? k fa( )g are linearly independent. If ` `=1 ? 1 8 2 nS ; G g < 1; (2:2 ERC) ? ? then OMP with y = c a( ) as input unambiguously identi es S and `=1 ` k k fc g in k iterations for any choice of the coecients fc g  R . Con- ` ` `=1 `=1 versely, if (2:2 ERC) is not satis ed, there exist not all-zero coecients fc g `=1 ? ? such that OMP with y = c a( ) as input selects some  2= S at the rst `=1 ` iteration. A proof of the direct part of this result can be found in [45, Th. 3.1]. The converse part is a slight variation of Tropp's original statement [45, Th. 3.10] and a proof can be found in e.g., [1, Prop. 3.15]. 7 Condition (2:2 ERC) is usually referred to as the \Exact Recovery Condi- tion" in the literature, and simply denoted ERC. Assuming linear independence ? k of the atoms fa( )g , it can be reformulated in the following (and perhaps ` `=1 more interpretable) way: ? 0 8r 2 R ? nf0 g; 8 2 nS ; jha();rij < maxjha( );rij (2.3) S H 0 ? 2S ? k where R ? , span(fa( )g ). In other words, it implies that OMP always ` `=1 selects a parameter in S during the rst k iterations for any input vector y ? k resulting from the linear combination of the atoms fa( )g . The converse ` `=1 part shows that (2:2 ERC) is worst-case necessary in the following sense: if (2:2 ERC) is not satis ed, there exists (at least) one non-trivial linear com- ? k ? bination of the atoms fa( )g such that OMP selects an element  2= S at ` `=1 the rst iteration; in this case the correct identi cation of S in k iterations is obviously not possible. Interestingly, condition (2:2 ERC) is also related (along with the linear in- ? k dependence of the atoms fa( )g ) to the success of MP, OLS and some con- ` `=1 vex relaxation procedures. In [46, Th. 2], the authors showed that (2:2 ERC) is also necessary and sucient for exact k-step recovery of S by OLS. Regard- ing MP, (2:2 ERC) ensures that the procedure only selects atoms in S but does not imply exact recovery after k iterations of the algorithm since the same atom can be selected many times (the least-squares update of the coecients in Algorithm 1 is not carried out), see e.g., [47, Th. 1]. Finally, in [48, Th. 3] [49, Th. 8], the authors show that (2:2 ERC) also ensures correct identi cation of S by some convex relaxation procedures as e.g., BP or Lasso. Coherence. Tropp's condition is of limited practical interest to characterize the recovery of all supports of size k since it requires to verify that (2:2 ERC) ? ? holds for any S with card(S ) = k. In order to circumvent this issue, other sucient conditions of success, weaker but easier to evaluate in practice, have been proposed in the literature. One of the most popular conditions is based on the coherence  of a normalized dictionary. Assuming the atoms of the dictionary are of unit norm, this condition writes (with the convention that = +1): 1 1 k < 1 + (2.4) where , sup j(;  )j: (2.5) ; 2 6= Condition (2.4), together with the normalization of the dictionary, implies that ? ? (2:2 ERC) is veri ed for any S with card(S )  k, and also implies the linear independence of any group of k atoms of the dictionary. It therefore implies that ? ? ? OMP and OLS correctly identify any S with card(S )  k in exactly card(S ) ? ? iterations. It also ensures the correct identi cation of any S with card(S )  k 8 by BP and Lasso. In [50], the authors emphasize that condition (2.4) can be slightly relaxed if the coecients fc g exhibit some decay. `=1 The coherence of the dictionary can be seen as a particular measure of \prox- imity" between the atoms of the dictionary. Other exact recovery conditions, based on di erent proximity measures, have been proposed in the literature. In [45, Th. 3.5], the author derived recovery conditions based on \cumulative coherence", whereas in [11, 51{57], guarantees based on \restricted isometry constants" were proposed. Given that Tropp's condition is both necessary and sucient, all such recovery conditions imply that the ERC holds for any support of size k. 2.2. Continuous setting General setup. Sparse representations in continuous dictionaries are basically characterized by two main ingredients: i) a parameter set , usually assumed to be a connected subset of R with non-empty interior or a torus in dimension D. We note that, in this paper, we restrict our attention to the case where  = R for D  1. ii) an \atom" function a :  ! H, assumed to be continuous and injective. This type of dictionary appears in numerous signal processing tasks such as sparse spike deconvolution or super-resolution where one aims to recover ne- scale details from an under-resolved input signal [3, 18, 35, 59]. Irrelevance of existing analyses. The continuity of a() does not allow most of the analyses performed in the context of discrete dictionaries to be extended to the continuous framework. In particular, all exact recovery conditions based on coherence or restricted isometry constants turn out to be violated whenever dealing with continuous dictionaries. As for the coherence condition (2.4), it is easy to see that the continuity of a() implies the continuity of (;) with respect to both its arguments. This, in turn, implies that  = 1 (for normalized atoms) and the coherence-based condition (2.4) is never met, even for k = 1! In order to circumvent this issue, some speci c exact recovery conditions for continuous dictionaries have been proposed in the literature, see e.g., [3, 4, 30, 60]. We review below the main ingredients grounding these conditions of recovery. In the context of convex-relaxation approaches, these conditions originate from the analysis of the associated optimality conditions. A separation condition for BP for continuous dictionaries. In the context of BP for continuous dictionaries, the question of exact recovery can be rephrased as ? k ? follows: if the atomsfa( )g are linearly independent and y = c a( ), ` `=1 `=1 ` is the solution of BP for continuous dictionaries unique and equal to a discrete = 0 if all the atoms are pairwise orthogonal and  ' 1 if some atoms are very correlated. These two assumptions implies that  is uncountable [58, Ch. 1, Exercise 19d]. 9 ? k k measure supported on f g with weights fc g ? ` `=1 `=1 The case where each atom is a collection of Fourier coecients (H = C ) has received a lot of attention due to its connection with the super-resolution prob- lem. Indeed, the latter scenario is equivalent to recovering in nitely resolved details (the parameters) from some low-pass observation. Without further as- sumptions on the coecients, the targeted measure is the unique solution of BP for continuous dictionaries provided that [3, Th. 1.2 and 1.3] ? ? min j 0  j > ; (2.6) ` ` `;` 2J1;kK f `6=` where jj is the ` distance on the D-dimensional torus (maximum deviation in any coordinate), C is a constant that depends on the parameter dimension D and f is the cut-o frequency of the observation low-pass lter. A framing of the value of C has been proposed by the same authors, further re ned in [4, Cor. 1] and [61, Th. 2.2]. ? k k We see that (2.6) implies the recovery of f g and fc g provided ` `=1 `=1 ? k that the elements of f g verify some \minimum separation" condition. In- ` `=1 terestingly, as shown in [30, Th. 2.1], this separation condition is no longer needed when dealing with positive linear combination of atoms (that is, when all the coecients fc g are positive). The authors showed moreover that this `=1 separation-free result for positive linear combinations holds for any dictionary such that the atom function a forms a \Chebyshev system" [62, Ch. 2] and provided that 2k + 1 observations are available (i.e., dim(H)  2k + 1). Dual certi cates for the BLasso problem. In [4], the authors derived several dual certi cates for the BLasso problem generalizing the work done by Fuchs for the Lasso [48] to an in nite-dimensional setup. They rst show that the existence of a \vanishing derivative pre-certi cate" [4, Def. 6] is necessary so that the support ? k of the solution to the BLasso problem is exactly f g [4, Prop. 8]. On the ` `=1 other hand, they show that a so-called \non degenerate source condition" [4, Def. 5] is sucient to ensure the desired recovery [4, Th. 2]. We note that these two conditions apply on dictionaries made up of di erentiable atom functions a :  7! H since they involve the rst and second order derivatives of the inner ? k product ha();yi evaluated at f g . This is in contrast with the \CMF" ` `=1 dictionaries considered in this paper which involve some non-di erentiability in their kernel function, see Section 3.3. Moreover, we also emphasize that both conditions involve the sign of the coecients fc g . `=1 The comparison with the discrete case goes even deeper: it can be shown that the solution of BP for continuous dictionaries is, in some sense, the limit of the solution of BLasso [4, Prop. 1]. Although out of scope of the present paper, we also mention the existence of a literature related to the robustness of BLasso in various noisy settings [4, 63, 64]. 10 3. Main results In this section, we present the main results of the paper. In Section 3.1, we describe the constitutive ingredients of the \continuous" setup addressed in this work and provide a rigorous de nition of the notions of exact recovery that will be used in our statements. Our main results are presented in Sections 3.2 and 3.4. The family of \CMF dictionaries", central to our results in Section 3.4, is introduced in Section 3.3. 3.1. Main ingredients We rst present the three main properties that a \continuous" dictionary should verify, see (3.4a), (3.4b) and (3.6) below. We then elaborate on some di erences between the implementation of OMP in the discrete and continuous settings. We nally give a precise de nition of the notions of recovery that will be used in the statements of our results. Continuous dictionary. First, the space  is usually assumed to be a connected metric space or a torus in dimension D. Hereafter, for the sake of conciseness, we will restrict our attention to the case where  = R and assume that the kernel associated to the atoms obeys some vanishing property (see (3.6) below). A second common working hypothesis in the \continuous" setup is the continuity of function a :  ! H, that is lim ka( ) a()k = 0 8 2 : (3.1) In this paper, we will moreover suppose that the atoms of the dictionary are normalized: ka()k = 1 8 2 : (3.2) In the sequel, recovery conditions will be expressed as a function of the induced kernel  ;  : 0 0 0 (;  ) , ha();a( )i 8;  2 : (3.3) The \continuity " and \unit-norm " properties are equivalent to: \unit norm " :  ;  = 1 8 2 ; (3.4a) \continuity " : lim  ;  = 1 8 2 : (3.4b) Moreover, we have from the Cauchy-Schwarz inequality that 0 0 ;   1; 8;  2 : (3.5) Our results can be adapted to any set  such that step 6 of Algorithm 1 is well-posed, that is, at least one maximizer exists. 11 Lastly, in this work, we will restrict our attention to kernels that vanish at in nity, i.e., 8 " > 0;8 2  ; 9K compact: sup ( ; ) < "; (3.6) 0 c 2K where K is the complement of K in . This covers the case where  is compact, by simply considering K =  with the convention sup 0  = 0. 2; OMP in continuous dictionaries. Although Algorithm 1 corresponds to the standard de nition of OMP in the discrete setting, its implementation in con- tinuous dictionaries leads to two major di erences. First, the \atom selection" step in Line 6 does not necessarily admit a maximizer. In such a case, the recur- sions de ned in Algorithm 1 are ill-posed since the procedure cannot elucidate the maximization problem in Line 6. Second, even if a maximizer exists, solving the \atom selection" problem may be computationally intractable. In particu- lar, the function to be maximized in Line 6 may have many local maxima and the problem is indeed NP-hard in certain cases (e.g., when the maximization step involves a rank-1 approximation of a tensor [65, Th. 1.13] as in [66]), while in other cases it is easy (for example, the SVD can be revisited in this frame- work and solved up to numerical precision [67, Sec. 2.2]). The maximization problem of Line 6 also appears in Frank-Wolfe type algorithms [35], where ex- isting theoretical guarantees also hold under the hypothesis that this step can be solved. In this paper, for the sake of simplifying our theoretical analysis, we will nev- ertheless stick to the idealistic version of OMP described in Algorithm 1. The results presented in this work should therefore be considered more for the theo- retical insights they provide into the behavior of OMP in continuous dictionaries than for their practical implications. In our theoretical analysis we will only have to deal with residuals r which can be written as a linear combination of a nite number of atoms of A. In such a case, the following lemma shows that a maximizer to the \atom selection" problem always exists: Lemma 1. Let A = fa() :  2 g be a continuous dictionary with kernel 0 0 (;  ) , ha();a( )i verifying the continuity property (3.4b) and vanishing property (3.6). Then arg maxjha();rij 6= ; (3.7) whenever r 2 Hnf0 g is a nite linear combination of elements of A. A proof of this statement is available in Appendix C.1. We nally emphasize that the solution to the OMP recursions may not be unique. Indeed, in situations where the \atom selection" problem in Line 6 admits several solutions, there may exist several output sets, S and fc g , `=1 verifying OMP recursions. Hereafter, given an observation vector y, we will call any set S which can be generated by OMP with y as input, as a \reachable support". 12 Notions of recovery. The recovery results stated in the next sections of the paper will involve the following notions of success: \exact k-step recovery of ? ? S " and \exact S -delayed recovery of S ". We devote the remainder of this section to rigorously de ning these two notions. We say that OMP achieves exact recovery of coecients fc g  R `=1 ? ? k k ? and atom parameters S , f g if fc g and S can be unambiguously ` `=1 `=1 identi ed from any reachable outputs of OMP (S and fbc g ) run with y = `=1 c a( ) as input. We note that a simple necessary and sucient condition `=1 ` k ? for exact recovery of fc g and S reads `=1 S  S; (3.8) for each reachable support S . This can be seen from the following arguments. If there is a reachable support such that S + S , then exact recovery is obvi- ? ? ously not attained since there exists some  2 S that is not identi ed in S . Conversely, if S  S holds, one must have bc = c if  2 S ` ` ` 8` 2 J1; kK; (3.9) bc = 0 otherwise; because the atoms fa() :  2 Sg selected by OMP are always linearly indepen- ? k ? ? k dent and y 2 span(fa( )g ). Therefore, S = f g can be unambiguously ` `=1 ` `=1 identi ed from the non-zero elements of fbc g . `=1 In the literature related to the conditions of success of OMP, a distinction is ? ? b b usually made between the cases \S = S " and \S  S ": the former is referred ? k to as \k-step recovery" because it implies that OMP identi es S and fc g `=1 in exactly k steps; the latter is known as \delayed recovery" because OMP may require (if the inclusion is strict) to carry out more than k iterations to identify ? k S and fc g . `=1 In this paper we will focus on conditions ensuring the correct identi cation of a given support S of cardinality k for any choice of the non-zero weighting k ? coecients fc g . The notions of \exact k-step recovery of S " and \exact `=1 S -delayed recovery of S " announced at the beginning of this section then read ? ? as follows. We say that OMP achieves \exact k-step recovery of S " if S = S for any choice of fc g  R and any reachable output S . This implies that `=1 there is only one reachable output. Moreover, given some set S  , we say that OMP achieves \exact S -delayed recovery of S " if S  S  S (3.10) for any choice of fc g  R and any reachable output S . \S -delayed recov- `=1 ery" can be regarded as a re ned version of \delayed recovery" where the set of parameters that OMP may select is guaranteed to belong to some set S  S . Uninterestingly this is always the case with S = , so what will be important in our results is to establish conditions such that we can identify a nite set S , 13 ? ? determined by the only speci cation of S , such that S -delayed recovery of S holds. We note that S -delayed recovery implies that OMP identi es S in at most card(S ) iterations. Finally, we emphasize that \exact S -delayed recovery ? ? of S " is equivalent to \exact k-step recovery of S ". We will sometimes use the former in the formulation of our results to have more compact statements. We will also always implicitly assume that OMP achieves k-step recovery of ? ? S when S = ; since this implies that y = 0 and OMP returns the empty support S = ; at iteration 0 in this case. 3.2. Exact recovery of a given support: sucient conditions In this section, we highlight some instrumental properties of the dictionary A and support S which allow OMP to achieve exact card(S )-step recovery of each S  S (see Theorem 2). These conditions are the basis of our results on \CMF dictionaries" stated in Section 3.4. We rst notice that, in the context of continuous dictionaries, the k-step analysis of Theorem 1 still applies: condition (2:2 ERC) along with the linear independence of the atoms fa( )g are still necessary and sucient for exact `=1 ? 6 recovery of a support S . However, the standard formulation max G g < 1; (3.11) ? 1 2nS equivalent to (2:2 ERC) in the discrete setting, does no longer hold in the case of continuous dictionaries as the supremum sup G g (3.12) 2nS is always at least 1. In order to circumvent this problem, we identify below two simpler condi- tions, respectively on the dictionary A (via its induced kernel ) and the support ? ? k S , which imply that the atoms fa( )g are linearly independent and that ` `=1 (2:2 ERC) is veri ed, see Theorem 2 below. The following de nition includes assumptions on the kernel ensuring that the dictionary atoms are normalized, and that the atom function  7! a() is injective and continuous. De nition 1 (Admissible kernel). A kernel  is said to be admissible if: i) it veri es (3.4) and (3.6). 0 0 ii) 0   ;  < 1 for any  6=  . By extension, a dictionary A = fa() :  2 g is said to be admissible if its induced kernel is admissible. We note in particular that (2:2 ERC) ensures that the \atom selection" step in Line 6 of Algorithm 1 is well-de ned since the maximizers are ensured to belong to the nite set S . 7 1 ? Indeed, rst notice that G g = 1 for all  2 S . One then obtains that the supremum is at least 1 by continuity of  7! G g . 14 ? De nition 2 (Admissible support with respect to kernel ). A support S = ? k f g is admissible with respect to a kernel  if the following holds for any ` `=1 non-empty subset T  J1; kK and any positive coecients fc g  R such ` `2T that c < 1: `2T i) The set of global maximizers of :  ! R ; (3.13) 7! c (;  ) `2T is a subset of f g . ` `2T ? ? ii) If ` 2 J1; kKn T satis es () (;  )  0 for all  2 f g , then 0 ` 2T ` ` 8  2  ; () (;  )  0: (3.14) By extension, the support S is said to be admissible with respect to dictionary A = fa() :  2 g if S is admissible with respect to the kernel induced by A. With these de nitions, our rst recovery result reads: Theorem 2. Assume A is admissible and S is admissible with respect to A. Then, OMP achieves exact card(S )-step recovery of each S  S . A proof of this result is available in Appendix A. Theorem 2 provides some sucient conditions for exact card(S )-step recovery of any S  S via the de nitions of \admissible dictionary" and \admissible support". In particular, the conditions of Theorem 2 imply that (2:2 ERC) is satis ed for any S  S . As we will see in Section 3.4, the admissibility of A and S may be much easier to prove in some cases than verifying directly that (2:2 ERC) holds. As the admissibility conditions stated in De nitions 1 and 2 may appear somewhat technical, we discuss hereafter the di erent items appearing in these de nitions in order to shed some light on the scope of Theorem 2. In De nition 1, (3.4) ensures that the kernel  induced by A is continu- ous and that the dictionary A only contains unit-norm atoms. The continuity assumption is crucial in the derivation of our result since it induces a spe- ci c structure on the dictionary. The unit-norm hypothesis is only secondary but allows to avoid some unnecessary technicalities in the proofs. Hypothesis (3.6) ensures the well-posedness of the \atom selection" step in Line 6 of Al- gorithm 1, see Lemma 1. Finally, 0  (;  ) implies that the inner product between two atoms of A is always nonnegative, whereas (;  ) < 1 guarantees 0 0 that a() 6= a( ) for  6=  , i.e., that  7! a() is an injective function (remem- ber that we assume (; ) = 1 for all  2 ; the fact that atoms are distinct is thus a direct consequence of the Cauchy-Schwarz inequality). The atoms 0 0 a() 6= a( ) for  6=  being normalized, distinct, and positively correlated, they are also linearly independent. 15 As for De nition 2, item i) ensures that a correct atom selection always occurs when the residual r is a positive combination of the atoms of the support and the kernel  is admissible. Indeed, if r = c a( ) with c ; : : : ; c > 0 ` 1 k `=1 ` and the kernel is admissible then from De nition 1 jha();rij = c (;  ): (3.15) `=1 In such a case, item i) of De nition 2 then implies arg max jha();rij  S : (3.16) Item ii) of De nition 2 does not have such a simple interpretation but a careful inspection of our proof in Appendix A shows that this condition is instrumental for deriving the result stated in Theorem 2. Altogether, given some admissible dictionary A, Theorem 2 allows us to establish recovery results valid without sign constraints by only proving the two assumptions gathered in De nition 2, which somehow correspond to establishing the result for the easier case of pos- itive combinations of atoms. 3.3. CMF dictionaries In the next section, we will particularize Theorem 2 to a family of dictionaries whose kernel is de ned via a completely monotone function (CMF). In this section, we provide a precise de nition of this family of dictionaries and some of their properties that will be used throughout the paper. We rst recall the de nition of a CMF: De nition 3 (CMF [68, Def. 7.1]). A function ' : R 7! R is completely monotone on [0; +1[ if it is in nitely di erentiable on ]0; +1[, right continuous at 0, and if its derivatives obey n (n) (1) ' (x)  0 8x; n 2 R  N: (3.17) As described in the following example, many well-known functions are CMFs: Example 2. The following functions are completely monotone [6]: • the function x 7! e for  > 0 which gives birth to the Laplace kernel, • the function x 7! for  > 0, 1+x • ratios of modi ed Bessel functions of the rst kind, • a subset of the con uent hypergeometric functions (Kummer's function), In particular, we note that if item i) of De nition 2 is true, then its conclusion still holds without the hypothesis \ c < 1". `2T 16 • a subset of the Gauss hypergeometric functions. By de nition, CMFs are non-negative, non-increasing and convex functions. Moreover, they admit an integral formulation in terms of Laplace transform of a Borel measure: Lemma 2 (Bernstein-Widder theorem, [68, Th. 7.11]). A function ' is com- pletely monotone on [0; +1[ if and only if there exists a non-negative nite measure  on Borel sets of [0; +1[ such that ux '(x) = e d(u); (3.18) ux where the integral converges for all x  0 since  is nite and e  1. We note for example that the Laplace kernel (see Example 2) is a CMF with representation measure equal to  =  with  > 0. In the sequel we will consider the following class of kernels and dictionaries whose de nitions rely on the concept of CMF: De nition 4 (CMF kernel and dictionary). The class of CMF kernels in di- D D mension D  1, denoted K (D), consists of all kernels  : R  R ! R CMF such that 0 0 0 D (;  ) = ' k  k 8;  2 R (3.19) where ' is a CMF verifying '(0) = 1, lim '(x) = 0 and 0 < p  1. x!+1 By extension, we say that A is a CMF dictionary in dimension D  1 if its induced kernel belongs to K (D): CMF We note that the constraint '(0) = 1 in the previous de nition ensures that the \unit-norm" hypothesis (3.4a) is satis ed. We also mention that the con- straint lim '(x) = 0 is necessary so that CMF kernels satisfy the vanishing x!+1 property (3.6). At this point, a legitimate question is whether kernel (3.19) can be induced by some dictionary A? The answer is positive and is a corollary of the following lemma: Lemma 3. Let ' : R ! R be a CMF such that '(0) = 1, lim '(x) = 0 + x!1 and 0 < p  1. Then, any function of the form : R ! R ! 7! '(k!k ) (3.20) is positive de nite. A proof of this result is provided in Appendix B.1. We refer the reader to [68, Def. 6.1] for a precise de nition of positive (semi-) de nite functions and [68, Th. 6.2] for a review of some of their basic properties. In particular, the positive de nite nature of '(k  k ) used in conjunction with standard results in the theory of \reproducing kernel Hilbert spaces" (see e.g., [69, Th. 3.11]) implies the following corollary: 17 Corollary 1 (Existence of CMF dictionaries). For any  2 K (D), there CMF exists some Hilbert space H and some (continuous) function a : R ! H such that (3.3) holds. Moreover, any nite collection of distinct elements from A = fa() :  2 R g is linearly independent. We see from the last part of the corollary that CMF dictionaries are necessar- ily de ned in in nite-dimensional Hilbert spaces H. If not, any collection of dim(H) + 1 elements of A would be linearly dependent which is in contradiction with Corollary 1. The next example exhibits a family of atoms in H = L (R) which is a CMF dictionary in R. Example 3. Let  = R and consider the dictionary A de ned by a : R ! L (R) p (3.21) (t) 7! f (t) = 2 e 1 (t) ftg for some  > 0, where 1 is the \indicator" function which is equal to 1 if ftg t   and 0 otherwise. Straightforward calculations both show that ka()k = 1 for any  and the inner product in H = L (R) between two atoms writes 0 j j a  ;a  = e . The latter function corresponds to the so-called \Laplace kernel". This kernel is an element of K (1) according to Example 2. CMF We conclude this section by introducing a particular CMF kernel which will be used in the statement of some of our results in Section 3.4: De nition 5 (Generalized Laplace kernel and dictionary). The class of Gener- alized Laplace kernels in dimension D, denoted K (D), consists of all kernels Lap D D : R  R ! R such that 0 p 0 k k 0 D (;  ) = e 8;  2 R (3.22) where  > 0 and 0 < p  1. By extension, a Generalized Laplace dictionary in dimension D  1 is a collec- tion of atoms A = a() :  2 R whose induced kernel belongs to K (D). Lap One immediately sees that K (D)  K (D) since the function t 7! e Lap CMF de ned on R is a CMF (see Example 2). 3.4. Recovery conditions in CMF dictionaries In this section, we provide recovery results for OMP in CMF dictionaries. The proofs of our results are based on the sucient conditions presented in Theorem 2 and are reported in Appendix B. A rst surprising result holds when  = R: Theorem 3. Assume A is a CMF dictionary in dimension 1. Then, OMP ? ? achieves exact card(S )-step recovery of each nite support S  R. 18 In essence, Theorem 3 identi es a class of dictionaries for which exact k-step recovery is possible for any support S of any nite size k. We note that the notions of exact recovery of a support S de ned in Section 3.1 do not involve any sign constraint on the coecients fc g used to generate the observa- `=1 tion vector y. As a comparison, the results ensuring the success of continuous BP/BLasso with no sign constraints on fc g require some \minimum sep- `=1 ? k aration condition" between parameters f g to hold (see (2.6) and related ` `=1 discussion). Conversely, the recovery results for BLasso obtained in [30] without separation condition require weighting coecients fc g to be positive. The `=1 novelty of Theorem 3 is thus a separation-free recovery result for any signed nite linear combination of atoms. The strength of the result obtained in The- orem 3 comes however at a price: it applies to a speci c family of dictionaries, namely CMF dictionaries. In particular, as mentioned in Section 3.3, the space H in which CMF dictionaries live is necessarily in nite-dimensional, and the corresponding kernels exhibit a discontinuity in all their partial derivatives at =  2 . Another price to pay is that the recovery guarantees are for OMP, an algorithm explicitly involving the search for the global maximum of an optimization problem, cf Line 6 of Algorithm 1. In higher dimension D > 1, the \universal" exact recovery result stated in Theorem 3 no longer holds, as shown in the next example. More precisely, if D  3, we emphasize that there always exists a con guration of parameters ? k k f g such that OMP fails at the rst iteration for some fc g  R : ` `=1 `=1 ? ? k D Example 4. Let D  3 and 3  k  D. Consider S , f g  R and ` `=1 > 0 such that ? ? p p 0 k  k = 2 8` 6= ` ` ` p ? p p k 0 k =  8`: ` p D D p Let a : R 7! H de ne a CMF dictionary in R with kernel  = ' kk . We next show that, if  is suciently small, there always exists a linear combi- ? k ? nation of fa( )g such that OMP selects a parameter not in S at the rst ` `=1 iteration. Let us consider y = c a( ) and assume that all coecients c are ` ` `=1 ` equal. We then have ha(0 );yi k'( ) = : (3.23) ? p ha( );yi 1 + (k 1)'(2 ) Then,  = 0 will be preferred to all \ground-truth" parameters  at the rst iteration of OMP as soon as the quantity in (3.23) is larger than 1, or, equiva- lently, p p (k 1)'(2 ) k'( ) + 1 < 0: (3.24) Let us show that (3.24) holds whenever  is \suciently small". For simplic- ity, consider rst the case where '(t) = e with  > 0. Condition (3.24) writes (k 1)x kx + 1 < 0 (3.25) 19 p with x = '( ) = e . As k  3, the left-hand side of (3.25) is a second or- der polynomial with two distinct roots, namely (k 1) and 1. Therefore, OMP p 1 prefers 0 as soon as (k 1) < x < 1 or, equivalently,  <  log(k 1). The latter condition implies a necessary separation condition such that OMP does not fail at the rst iteration. We note that it is possible to draw simi- lar conclusions whenever ' is a CMF function right-di erentiable at zero. The proof of this result requires extra work that is detailed in Appendix D. Although a \universal" k-step recovery result such as Theorem 3 no longer holds in CMF dictionaries when D > 1, it is nevertheless possible to show that some form of exact recovery of a support S is possible under an additional condition on the kernel induced by the CMF dictionary (see Theorem 4). This additional condition is referred to as \axis admissibility" hereafter and is encap- sulated in De nition 7 below. Before moving on to this de nition, it is necessary to introduce the notions of \Cartesian grid" and \set augmenter operator": De nition 6 (Cartesian grid). A nite set G  R is a Cartesian grid in dimension D  1 if there exists D one-dimensional nite sets fS g such d=1 that G = S ; (3.26) d=1 where denotes the Cartesian product. We moreover de ne the following \set augmenter" operator that, given a nite set S  R , returns the smallest Cartesian grid containing S : n o Grid(S ) , [d] :  2 S : (3.27) d=1 It is quite straightforward to see that S  Grid(S ) for any nite set S  R and that the operator Grid is idempotent. We illustrate the de nition of Grid(S ) in Fig. 1 in dimension D = 2 for S = f ;  ;  g. 1 2 3 We are now ready to introduce the notion of \axis admissibility": De nition 7 (Axis admissibility with respect to a kernel). A Cartesian grid D card(G) G = S = f g is said to be axis admissible with respect to a kernel d ` d=1 `=1 card(G) if and only if 8d 2 J1; DK, 8 2 R with [d] = 0 and 8fc g  R such `=1 that the function card(G) f (t) = c ( + te ;  ) (3.28) d ` d ` `=1 is not identically zero, we have ; =6 arg max f (t)  S : (3.29) d d t2R 20 θ θ θ 4 5 1 θ θ θ 2 6 7 θ θ θ 8 3 9 Figure 1: Illustration in dimension D = 2 with k = 3 of the de nition of the set augmenter Grid de ned in (3.27). The blue points, denoted  for ` 2 f1; 2; 3g, form the support S. The red points, denoted  , ` 2 J4; 9K represent the elements of Grid(S)nS . By extension, a Cartesian grid G is said to be axis admissible with respect to a dictionary A if it is axis admissible with respect to the kernel induced by A. The notion of axis admissibility will be central in our next result to ensure the ? ? k exact recovery of some supportS = f g in a CMF dictionary. In particular, ` `=1 ? ? we will see that axis admissibility of Grid(S ) ensures exact Grid(S )-delayed ? ? ? recovery of each S  S . Moreover, exact S -delayed recovery of each S  S is achievable by combining axis admissibility of Grid(S ) with the following restricted version of the ERC: max G g < 1 (3.30-R-ERC) ? 1 2 Grid(S )nS where 0 ? ? 0 G[`; ` ] , ha( );a( )i 8`; ` 2 J1; kK ` ` : (3.31) g [`] , ha();a( )i 8` 2 J1; kK Formally, our next result writes as follows: Theorem 4. Let A be a CMF dictionary in R with induced kernel  and let ? ? k S = f g . ` `=1 ? ? • If Grid(S ) is axis admissible with respect to , then OMP achieves Grid(S )- delayed recovery of each S  S . If (3.30-R-ERC) moreover holds, OMP ? ? achieves S -delayed recovery of each S  S . We remind the reader that G is invertible as the Gram matrix of a set of linearly inde- pendent atoms (see Corollary 1). 21 • Conversely, if (3.30-R-ERC) does not hold, there exists not all-zero coef- k ? cients fc g such that OMP with y = c a( ) as input selects ` ` `=1 ` `=1 some  2= S at the rst iteration. A rst outcome of Theorem 4 is a (pessimistic) upper bound on the number ? ? of iterations needed to identify S when A is a CMF dictionary and Grid(S ) is axis admissible with respect to the kernel induced by A. In particular, the rst ? D part of the theorem states that OMP needs no more than card(Grid(S ))  k iterations to succeed. As shown in the second part of the theorem, this (rather pessimistic) upper bound on the number of iterations can be decreased to k if an additional restricted ERC (3.30-R-ERC) is veri ed. Interestingly, whereas the parameter space  is a continuum, (3.30-R-ERC) only depends on a nite subset of the elements of  (namely Grid(S )) and its numerical evaluation is therefore possible. We will see in Theorem 5 below, that this restricted ERC allows us to derive a separability condition for exact k-step recovery in Generalized Laplace dictionaries. Besides, we note that additional strategies could be investigated to improve the upper bound, exploiting, e.g., coecients decay [50]. In our next result, we show that the property of \axis admissibility" can be (at least) satis ed for some CMF dictionaries. In particular, the next lemma emphasizes that any Cartesian grid is axis admissible for Generalized Laplace dictionaries (see De nition 5): Lemma 4. Let A be a Generalized Laplace dictionary in R . Then all Carte- sian grids G are admissible with respect to A. A proof of this result is given in Appendix B.4. Combining this lemma with Theorem 4 immediately leads to the following corollary: Corollary 2. Let A be a Generalized Laplace dictionary in R . Then OMP ? ? D achieves exact Grid(S )-delayed recovery of each nite support S  R . Interestingly, although Example 4 showed that exact card(S )-step recovery does not hold for arbitrary S in CMF dictionaries, Corollary 2 emphasizes that exact Grid(S )-delayed recovery is achievable by OMP in Generalized Laplace ? ? dictionaries for any S and any k = card(S ) 2 N . Following our remark below Theorem 4, OMP is thus ensured to identify any support of size k in at most k iterations in this type of dictionaries. Similar to Theorem 3, no separation assumptions nor sign constraints are needed here to ensure our recovery result, although it applies to a very speci c family of dictionaries. We will see in Theorem 5 below that adding some separation condition on the elements of S enables to verify (3.30-R-ERC) and therefore leads to an exact-recovery result in at most k steps. Before moving on to the statement of this result, let us mention that, al- though Lemma 4 shows that any Cartesian grid is axis admissible with respect to Generalized Laplace dictionaries, such a result does in general not hold for CMF dictionaries without extra assumptions on the grid. Nevertheless, our em- pirical evidence suggests that the admissible grid assumption is only an artifact of our proof technique. We conjecture that Theorem 4 remains valid even when 22 the Cartesian grid G is not axis admissible. To support our conjecture, we show in Appendix E that the second part of Theorem 4 still holds for any CMF ? D ? ? dictionary and for any S  R with card(S ) = 2, even though Grid(S ) is generally not axis admissible. The proof of this kind of result in the general case is still under investigation. In the last result of this section, we particularize (3.30-R-ERC) to derive ? ? k a separation condition on the elements of S = f g that ensures exact ` `=1 card(S )-step recovery of each S  S in Generalized Laplace dictionaries. We rst note that, following standard results of the literature (see e.g., [45]), (3.30-R-ERC) can be relaxed to a mutual coherence condition: < (3.32) 2k 1 where , max jha();a( )ij: (3.33) 0 ? ; 2Grid(S ) 6= Our separation result is then a simple consequence of this mutual coherence condition: Theorem 5. Let A be a Generalized Laplace dictionary in R with parameters ? ? k > 0 and 0 < p  1. Consider S = f g and let ` `=1 ? ? 0 ? ? , min minfj [d]  [d]j : `; ` 2 J1; kK s.t.  [d] 6=  [d]g : (3.34) 0 0 ` ` ` ` d2J1;DK If log(2k 1) (3.35) then, OMP achieves exact card(S )-steps recovery of each S  S . ? 0 p Proof. By de nition of  and of Grid(S ), we have k  k   for all p 0 0 ? ;  2 Grid(S ). Hence, using the de nition of the mutual coherence in (3.33) 0 p 0 ? we have  = exp(k k ) for some ;  2 S so   exp( ) and (3.35) implies that  < (2k 1) holds. Theorem 5 states that, with Generalized Laplace dictionaries, OMP recovers any linear combination of k suciently separated atoms in k steps. Although condition (3.35) is expressed in terms of minimal distance between parameters, it can be seen as a condition on the mutual coherence between atoms. However, in contrast to the discrete case, this mutual coherence guarantee is only related to a particular nite subset of the (continuous) Generalized Laplace dictionary, namely the atoms with parameters in Grid(S ). Furthermore, condition (3.35) is reminiscent of the separation condition for o -the-grid super-resolution proposed in [3], see (2.6). The so-called separation condition discussed in (2.6) is expressed on a D-dimensional torus preventing also high values of k. For example, in a unit-length 1-dimensional torus and 23 with the notations of (2.6), the minimum separation condition for BP requires k  1. Note however that these results involve di erent dictionaries and settings making relevant comparison tedious. 4. Conclusion - discussion In this work, we have shown that the study of the recovery properties of greedy procedures such Orthogonal Matching Pursuit (OMP) can be extended to the setting of continuous dictionaries where the atoms continuously depend on some parameters. Capitalizing on the formulation of OMP in terms of in- ner products between atoms, our results rely on the properties of the kernel implicitly de ned by the inner product between atoms. More particularly, we have identi ed two key notions which we have called admissible kernel and ad- missible support, that are sucient to ensure exact recovery irrespective of the value of the coecients involved in the representation. For the class of CMF dictionaries, we have shown that when the dimension of the parameter space is 1, all implicitly de ned kernels as well as all supports are admissible. Up to our knowledge, this is the rst class of kernels for which no separation is needed to achieve exact recovery, even for signed combinations of atoms. However, such a \universal" recovery result comes at a price since CMF dictionaries can only live in in nite-dimensional observation spaces H and the corresponding kernels exhibit some discontinuities in their derivatives. Although exact recovery can also be ensured for CMF dictionaries with a pa- rameter space of dimension greater then 1, extra conditions have to be imposed on the support to be recovered, as some supports may not be admissible any- more. The cornerstone of our analysis in the multi-dimensional case is the notion of axis admissible Cartesian grid. Indeed, axis admissibility is sucient to allow OMP to identify supports, leading to a form of \delayed recovery" for all supports of size k embedded in some admissible Cartesian grid. For such supports, exact k-step recovery can also be achieved whenever a condition on a nite number of (known) atoms is ful lled. In the special case of Generalized Laplace dictionaries, any Cartesian grid turns out to be axis admissible, and a simpli ed coherence-based analysis can be revisited, leading to exact k-step recovery under a minimal separation condition. We now review some prospects of this work: Beyond axis-admissible grids for CMF kernels. Our analysis for multi-dimensional parameter sets relies on the notion of axis-admissible grids. While axis admis- sibility holds for any grid with respect to Generalized Laplace dictionaries, this is apparently no longer the case with respect to more general CMF dictionaries. Even for grids which seem to violate the axis-admissibility condition with re- spect to a CMF dictionary, empirical evidence suggests that Theorem 4 remains valid. As a rst step towards a better understanding of this phenomenon, we showed in Appendix E that, for supports of size 2, axis-admissibility is not necessary for the conclusion of Theorem 4 to hold. 24 Connection with TV-minimization. In light of the existing links between Tropp's ERC [45] and recovery guarantees for ` minimization [49], an interesting ques- tion is whether the guarantees developed in this paper can be extended to sparse spike recovery with total variation norm minimization (see Section 2). More particularly, one could bene t from the null-space properties for measures [30] which characterize the solution of the continuous version of Basis Pursuit. Such a connection may yield support recovery results for signed combinations of atoms with TV-norm minimization without separation conditions. Robustness to estimation error. In the discrete setting, one advantage of greedy procedures over convex relaxations is that the associated recovery guarantees involve solutions provided by actual algorithms rather than merely expressed as the minimizer of some optimization problem. In the continuous setting, this has to be tempered with the fact that implementing OMP requires a (possibly intractable) global maximization procedure at each iteration. Our current anal- ysis does not take into account the resulting numerical estimation error or the fact that there may be spurious local maxima. One could envision overcoming some of these limitations by analyzing the behavior of OMP when a small error is systematically done when maximizing the inner product in Line 6 of Algo- rithm 1. Note that such an approximation error may also be useful to account for discretized implementations of the latter step of OMP using a ne grid over the parameter set . A. Proof of Theorem 2 ? ? k Let S  S = f g . Without loss of generality, we assume that S 6= ; ` `=1 card(S) ? ? corresponds to the rst card(S ) elements of S , that is S = f g . `=1 We rst notice that, as a direct consequence of De nition 2, if S is admis- sible with respect to  then any S  S is also admissible with respect to . The result stated in Theorem 2 is then a direct consequence of Theorem 1 and the following proposition: card(S) Proposition 1. Assume kernel  is admissible and S = f g is admis- ` `=1 sible with respect to . Then we have that ? k i) the atoms fa( )g are linearly independent, ` `=1 ii) 8  2 nS ; G g < 1, where 0 ? ? 0 G[`; ` ] , ( ;  0 ) 8`; ` 2 J1; card(S )K (A.1) ` ` g [`] , (;  ) 8` 2 J1; card(S )K: (A.2) We thus spend the rest of this section in proving Proposition 1. 25 card(S) Proof of item i) of Proposition 1. Let fc g  R be such that y , `=1 card(S) c a( ) = 0 , and let T be the set of indices such that c 6= 0. ` H ` `=1 card(S) Without loss of generality, we can assume that jc j < 1. We will prove `=1 by contradiction that T is empty. Assuming that T is not empty, we rst prove that the sign of the coecients fc g cannot be all equal. To this end, let us assume (without loss of gener- ` `2T ality) that c > 0 for all ` 2 T and show that a contradiction occurs with the hypothesis of admissibility of S . Since y = 0 , the function :  7! ha();yi is identically equal to zero. Hence, on the one hand, any point of  is a maximizer. On the other hand, since all the elements of fc g are positive and S is (by ` `2T hypothesis) admissible with respect to , we have from item i) of De nition 2 that the maximizers of must belong to S . This implies that   S which contradicts the de nition of S and . Therefore, if T is not empty, not all the elements of fc g have the same sign. ` `2T We can thus partition T into two non-empty disjoint subsets: T = f` 2 T : c > 0g; + ` T = f` 2 T : c < 0g: Similarly, we let S = f 2 S : ` 2 T g; + + S = f 2 S : ` 2 T g: Since the elements of S  S [S are pairwise distinct, we have S \S = ;. + + P P ? ? De ning y = c a( ) and y = c a( ), we note that y = ` ` + `2T ` `2T ` y + y . Using the fact that y = 0 , one deduces that y = y . Moreover, + H + y (resp. y ) is a positive linear combination of atoms with parameters in S  S (resp. S  S ). Therefore, since S is admissible with respect to , item i) of De nition 2 applies and we have that any maximizer of :  7! a();y = a();y must belong to S \ S . Now, on the one hand, by Lemma 1, the set of maximizers of cannot be empty. On the other hand S \S = ;. This leads to a contradiction. Therefore we must have T = ;. In card(S) other words, y = 0 implies c = 0 8` 2 S , so that the atoms fa( )g H ` ` `=1 are linearly independent. As a consequence of this rst part of the proposition, the Gram matrix of card(S) any subset of fa( )g is a positive de nite matrix, and therefore invert- ` `=1 ible. In particular, the inverse of matrices G and G appearing in the second part of the proof is always well-de ned. Proof of item ii) of Proposition 1. Recall that, as a consequence of De nition 2, if S is admissible with respect to , then any support S  S is also admissible. We thus show our result by induction on the cardinality of S . For notational 0 0 0 convenience, we let hereafter k , card(S ). We prove by induction on k that: a) G 1 0 has nonnegative entries, 26 1 b) 8  2  ; G g has nonnegative entries, 0 1 c) 8 2 nS , kG g k < 1. The quantities G and g appearing above are de ned in (A.1)-(A.2) with the substitution S $ S . Item c) corresponds to result ii) of Proposition 1. Items a) and b) are intermediate results that allow a subdivision of the proof into steps. Initialization: k = 1. In this case, both G and g are scalars. Since  is admissible, we have G = 1 and g  0 (cf De nition 1). Therefore, items a) 1 ? and b) are ful lled and kG g k = g = (;  ) and, using De nition 1-ii), we have (;  ) < 1. Hence, item c) is also true. 0 0 Induction: 1 < k  k. We assume items a)-b)-c) hold for any S  S of 0 0 0 cardinality k 1  1. Considering S  S an arbitrary support of size k , we show that items a)-b)-c) also hold for S . Without loss of generality, we 0 0 will assume that S corresponds to the rst card(S ) elements of S , that is 0 ? k S = f g . ` `=1 ? k 1 0 We consider S = f g  S and use over-lined notations for quantities ` `=1 0 0 0 (k 1)(k 1) k 1 related to S : we denote by G 2 R , g 2 R the quantities 0 0 0 k k k de ned in (A.1)-(A.2) for S and by G 2 R ; g 2 R the same quantities 0 0 0 k 1 k 0 for S . Likewise, the notations g 2 R ; g 0 2 R for ` = 1 : : : k 1, 0 0 ` = 1 : : : k will refer to the columns of G and G, respectively. With these notations we have: g = 2 R 8  2  (A.3) (;  ) 0 0 G g 0 k k k G = 2 R (A.4) g 0 1 where we denote g , g for notational convenience. We note that, as men- 0 ? tioned above, item i) of Proposition 1 ensures that both G and G are invertible. Item a). We show that the last entry of u , G 1 0 is positive. Since the reasoning holds for any ordering of the  's, we then deduce that all the entries of u are positive. Block inversion results [70, Cor. 2.8.9] give 1 1 1 1 G + sG g 0 g 0 G sG g 0 1 k k k G = ; (A.5) sg G s T 1 where s , (1 g G g ) . Notice that 0 0 k k 1 1 g 0 G g 0  kg 0k G g 0 k k k k G g < 1: (A.6) 27 The rst inequality is a consequence of H older's inequality, the second of De - nition 1 and the third follows from induction hypothesis c). Hence s > 0. 1 0 T 0 0 The last entry of u = G 1 now writes u[k ] = s(1 g 0 G 1 ). By k k 1 1 1 induction hypothesis b), we have kG g 0k = g 0 G 1 . Using (A.6) and 1 k 1 k k the fact that s > 0, we thus have u[k ] > 0. Item b). We rst show that the last entry of v , G g is non-negative. Given the decomposition of G in (A.5), the last entry of G g writes 1 1 0 0 T ? T v[k ] = s g [k ] g 0 G g = s (;  0 ) g 0 G g : (A.7) k  k k ? T Since s > 0 (see (A.6)) it is then sucient to show that (;  ) v g  0, where v , G g , in order to show that v[k ]  0. This will be achieved by studying this quantity seen as a function of . Consider T  J1; k 1K the (possibly empty) set de ned by T , f` : v[`] 6= 0g and de ne :  ! R 1 + P 0 P k 1 T ? ? 7! v g = v[`](;  ) = v[`](;  ): (A.8) ` ` `=1 `2T Notice that: • () = g G g , • the entries of v are nonnegative by the induction hypothesis b). Moreover, P 0 k 1 from induction hypothesis c), we have v[`] = kG g 0k < 1, `=1 0 ? • for j 2 J1; k 1K and  =  we have g = g = Ge , where e is the j j j  j 0 1 T T k 1 ? j-th canonical vector of R . Hence, ( ) = g G g = g e = 0 0 1 j j k j k ? ? ? ? ? 0 ;  and ( )   ;  = 0 8j 6= k . 0 1 0 j j j k k Since S is admissible with respect to , we can apply item ii) of De nition 2 0 0 with ` = k to any ; =6 T  J1; k 1K. This leads to: ? T ? (;  0 ) v g = (;  0 ) ()  0 (A.9) k  k for all  2 . The same obviously holds if T is empty as () is identically zero and the admissibility of  implies that it is nonnegative (see De nition 1). Since this result does not depend on the ordering of the  's, we can nally conclude that all the elements of G g are nonnegative. Item c). Let :  ! R : (A.10) 7! G g We need to prove that () < 1 for all  2= S . 28 1 1 From item b), we know that G g has nonnegative entries, so thatkG g k = T 1 1 k 1 G g . Letting u , G 1 0 2 R , the function () can then be written k 2 as () , u[`](;  ): (A.11) `=1 Moreover we have u[`]  0 8` since we showed in item a) that G 1 0 = u has nonnegative entries. We also note that u 6= 0 0 since Gu = 1 0 . k k Applying item i) of De nition 2 together with the comment in Footnote 8, ? 0 we have that the maximizer of () must belong to f : u[`] 6= 0g  S . Now, 0 ? T 1 T 8 j 2 J1; k K ; ( ) = 1 G g = 1 e = 1: (A.12) 0 0 2  j j k k Therefore, () < 1 for all  2= S . B. Proofs related to CMF dictionaries This appendix contains the proofs of the results related to CMF dictionaries presented in Sections 3.3 and 3.4. We rst state and prove a technical lemma which will used in the proofs of Lemma 3 and Theorem 3: Lemma 5. Let ' be a CMF such that '(0) = 1 and lim '(t) = 0. Then t!1 the Borel measure  appearing in the integral representation of the CMF in Lemma 2 is nonzero and satis es (f0g) = 0 and (R ) = 1. Moreover, ' is (1) strictly positive and strictly decreasing on R , with ' (t) < 0 on R . Proof. By the integral representation of Lemma 2 we have (R ) = '(0) = 1 ux hence  is nonzero. Moreover '(x) = e d(u)  (f0g)  0 for each x  0. As lim '(x) = 0, it follows that (f0g) = 0 and therefore (R ) = x!+1 1. This proves the rst part of the statement. ux The positivy of ' follows from the fact that  is nonzero and e > 0 for all u; x  0. Hence, the integral representation (3.18) of ' yields '(x) > 0 for (1) each x > 0. Finally, we prove by contradiction that ' (t) < 0 on R . (1) (1) Assume the existence of t > 0 such that ' (t ) = 0. As ' is continuous 0 0 (1) and non-decreasing on R with ' (t)  0 for each t > 0, it follows that (1) ' (t) = 0 for each t  t , hence '(t) = '(t ) for each t  t . As we have just 0 0 0 seen, we have '(t ) > 0, hence this contradicts the assumption lim '(t) = 0 t!+1 B.1. Proof of Lemma 3 The outline of the proof is as follows. We rst show that for any 0 < p  1 D uk!k and ! 2 R , the quantity e is related to the characteristic function of uk!k some D-dimensional random vector Z . We then use this formulation of e as a characteristic function together with the Bernstein-Widder representation of CMFs (see Lemma 2) to show that the function '(k!k ) is positive de nite. 29 p uk!k Proof that e is the characteristic function of some random vector Z . In probability theory, the characteristic function of a real-valued random vector D D i! Z Z 2 R is the function ! 2 R 7! E [e ] where E denotes the expectation operator and i is the imaginary number. First we consider for any u  0 the scalar-valued function : R ! R uj!j ! 7! e and show that for u > 0 it is the characteristic function of some random variable Z which admits a density with respect to the Lebesgue measure. Our proof leverages a result due to P olya [71, Th. 1]. We reproduce this result hereafter for self-containedness of the paper: Theorem 6. Let  be a real-valued function de ned on R and such that: •  is continuous and even, •  is convex on R , • (0) = 1, • lim (!) = 0. !!+1 Then,  is the characteristic function of some random variable which admits a density with respect to the Lebesgue measure. Moreover this density is even and continuous everywhere, except possibly at zero. Observe that  is even, continuous and veri es  (0) = 1 and lim  (!) = u u !!+1 u 0 since u > 0. Moreover, for p 2 ]0; 1] and ! > 0, its second derivative on R is (2) p2 p u! (!) = up ! ((1 p) + up! )e : (2) Hence,  (!) > 0 for all ! > 0 and  is convex on R . As a consequence, u u satis es the assumptions of Theorem 6 and it is the characteristic function of some scalar random variable Z which admits a (continuous, except possibly at zero) density with respect to the Lebesgue measure, that is i!Z (!) = E e : u Z uk!k We are now ready to show that the function e is the characteristic function of some random vector Z . To this end, let us de ne the random vector 1 D Z = Z ; : : : ; Z as the concatenation of D independent copies of Z . We u u u u thus have D D h i h i Y Y T d p i! Z i![d]Z uk!k u p E e = E d e =  (![d]) = e : Z Z u d=1 d=1 30 1=p We note that for all u > 0 and ! 2 R we have  (!) =  (u !). Hence, u 1 the function  (!) can also be written as an expectation with respect to the random variable Z for all u > 0 and ! 2 R: h i 1=p iu !Z (!) = E e : (B.1) u Z Equation (B.1) obviously also holds for u = 0 since both sides of the equality are equal to 1 in that case. Using (B.1), we obtain h i 1=p T p iu ! Z uk!k E e = e 8u  0; (B.2) 1 D where Z = Z ; : : : ; Z is the concatenation of D independent copies of Z . 1 1 1 1 We will use the latter representation in the second part of the proof. Proof that '(k!k ) is a positive de nite function. We want to show that for k D k any k 2 N, any f g  R and any c 2 C nf0 g, we have ` k `=1 c Gc > 0; where () denotes the conjugate transpose operator and 0 p 0 G[`; ` ] , '(k  k ) 8`; ` 2 J1; kK: ` ` Note that in practice this will only be used for real-valued coecients, but the result is established for complex-values c to t with the standard de nition of a positive de nite function. Since ' is a CMF, Lemma 2 ensures the existence of a non-negative nite Borel measure  such that for all ! 2 R : p uk!k '(k!k ) = e d(u): Hence, k k X X H 0 p c Gc = c[`] c[` ]' k 0  k ` ` `=1 ` =1 k k X X 0 uk 0 k ` p = c[`] c[` ] e d(u) (B.3) `=1 ` =1 k k +1 h i X X 1=p T 0 iu (  ) Z 0 1 ` ` = c[`] c[` ] E e d(u); (B.4) `=1 ` =1 31 where the last equality follows (B.2). By linearity of the expectation, it follows: " # k k X X 1=p T H 0 iu (  0 ) Z ` 1 c Gc = E c[`] c[` ]e d(u) Z1 `=1 ` =1 " ! !# k k X X 1=p T 1=p T iu  Z 0 iu  Z 1 0 1 = E c[`]e c[` ]e d(u) `=1 ` =1 2 3 1=p T iu  Z 4 ` 5 = E c[`]e d(u)  0: (B.5) `=1 k p Since this holds for any c 2 C this shows that '(kk ) is positive semi-de nite. To establish that '(kk ) is a positive de nite function we now show that the equality in (B.5) can only occur if c = 0 . To this end, denote (z) , i z 2 1=p j c[`]e j and (u) , E (u Z ) for u 2 R , and assume that Z 1 + `=1 equality holds in (B.5), that is to say (u) d(u) = 0. We next show that this implies that c = 0 . First, we note that is continuous since h i 1=p (u) = E (u Z ) Z 1 k k h i X X 1=p T 0 iu ( 0 ) Z ` 1 = c[`] c[` ]E e `=1 ` =1 k k X X 0 iuk  k ` 0 ` p = c[`] c[` ] e `=1 ` =1 where the last equality follows (B.2). Second, since ' is a CMF satisfying '(0) = 1 and lim '(x) = 0, by x!1 Lemma 5 the non-negative nite Borel measure  satis es (R ) = 1. Since R = [ [n; n + 1] [ [1=(n + 1); 1=n], there must exist n  1 such that n1 n1 either  ([n; n + 1]) > 0 or  ([1=(n + 1); 1=n]) > 0. Without loss of generality consider the case  ([n; n + 1]) > 0 (the other one can be treated similarly). Since is continuous over the compact set [n; n+1], it attains its in mum over [n; n+1] at some u 2 [n; n + 1]. Now, because (u)  0 for every u  0 we have Z Z +1 n+1 0 = (u) d(u)  (u) d(u)  (u )([n; n + 1]) 0 n and we obtain (u ) = 0 since ([n; n + 1]) > 0. By construction u > 0. 0 0 Finally, we have that: • the distribution of Z has a density with respect to the Lebesgue mea- sure and its density is continuous, except possibly at points where one coordinate vanishes. 32 • is nonnegative and continuous (by construction as the squared modulus of a nite linear combination of exponentials). 1=p Hence, using the de nition of (u ) = E [ (u Z )], we deduce that there 0 Z 1 1 0 exist z 2 R (with non-vanishing coordinates) and r > 0 such that (z) = 0 8z 2 B(z ; r), where B(z ; r) is the open ball of radius r centered at z . For any 0 0 0 y 6= 0 and 0  t  r=kyk , we have z + ty 2 B(z ; r) and therefore D 2 0 0 i (z +ty) c[`]e = 0 8t 2 [0; r=kyk ]: (B.6) `=1 T T 0 it y k If y is such that  y 6=  0 y for all ` 6= ` , then the functions ft 7! e g ` ` `=1 are linearly independent on [0; r=kyk ] and (B.6) holds if and only if c = 0 . 2 k D T T It thus remains to show that there exists some y 2 R such that  y 6=  0 y ` ` for all ` 6= ` . To this end, let us consider the following nite set of vectors: 0 0 N , f  0 : `; ` 2 J1; kK; ` 6= ` g: (B.7) ` ` As the parameters  's are pairwise distinct, each n 2 N is nonzero. Denote H ` n the linear hyperplane whose normal vector is n, and consider H , [ H . n2N n Since H is the union of a nite number of D-dimensional hyperplanes, R n H D T is not empty. Consider y 2 R n H . Then, by construction, n y 6= 0 for all T T 0 n 2 N and therefore  y 6=  y for all ` 6= ` . This concludes the proof. ` ` B.2. Proof of Theorem 3 Our proof leverages Theorem 2 by showing that if A is a CMF dictionary in dimension 1 with induced kernel , then: a)  is admissible in the sense of De nition 1, ? ? k b) any nite support S = f g is admissible with respect to kernel  in ` `=1 the sense of De nition 2. The result stated in Theorem 3 is then a direct consequence of Theorem 2. We begin with a more general lemma establishing the claim a): Lemma 6. Any CMF kernel  2 K (D), D  1, is admissible. CMF Proof. First, since  2 K (D), there exists a CMF ' and scalar p 2 ]0; 1] CMF 0 0 p 0 D such that '(0) = 1 and (;  ) = '(k  k ) for all ;  2 R . Hence 0 0 (; ) = '(0) = 1 for all . Moreover, the function  7! (;  ) is continuous since both CMFs and ` -norms are continuous. Hence  satis es (3.4). The fact that  satis es the vanishing property (3.6) is a straightforward consequence of 0 0 p the fact that (;  ) = ' k  k and that lim '(t) = 0. Finally, we t!+1 prove item ii) of De nition 1. As ' satis es the assumptions of Lemma 5, it is 0 0 strictly positive and strictly decreasing. This implies (;  ) > 0 for any ;  . 0 0 p Moreover, if  6=  then t = k  k > 0 and '(t) < '(0) = 1. 33 The rest of this section is dedicated to the proof of claim b). Let us consider a non-empty subset of indices T  J1; kK with t = card(T ). Without loss of generality (up to some reordering), we assume that T = J1; tK. Letfc g  R `=1 + be such that s , c < 1 (B.8) `=1 and consider the function : R ! R (B.9) 7! c (;  ): `=1 Using the integral formulation of CMF (see Lemma 2), we have that is twice ? t di erentiable at any  2 R n f g [72, proof of Th. 12a] and its second ` `=1 derivative writes: ? p p2 (2) ? uj j () = p(1 p)c uj j e d(u) `=1 ? p 2(p1) 2 2 ? uj j + p c u j j e d(u) (B.10) We next show that items i) and ii) of De nition 2 hold. Item i) of De nition 2. First, the vanishing property (3.6) of admissible kernels ensures that admits at least one maximizer (see Lemma 1). We then show ? t that any maximizer of must necessarily belong to f g . ` `=1 ? t Since is twice continuously di erentiable on R n f g , any maximizer ` `=1 ? t (2) 2 Rnf g must verify the second-order optimality condition \ ( ) m m ` `=1 ? t 0". Now this condition can never be ful lled for  2 R n f g . Indeed, ` `=1 ? t we see that each integral term in (B.10) is positive since  2= f g and, by ` `=1 (2) Lemma 5, (R ) > 0. Since p 2 ]0; 1] and c > 08`, it follows that ( ) > 0. ` m Item ii) of De nition 2. Assume T = J1; tK 6= J1; kK and consider t 2 J1; kKn T . Without loss of generality, suppose that t = t + 1 and consider : R ! R (B.11) 7! ()  ;  : t+1 From the de nition of in (B.9),  writes ? ? () = c ' j  j '   : (B.12) ` p t+1 `=1 Assume that ( )  0 for every ` 2 J1; tK. We will then show that ()  0 for ? t every  2 Rnf g so that item ii) of De nition 2 is satis ed. ` `=1 34 Suppose there exists  2 R such that ( ) > 0 and let us show that this 0 0 leads to a contradiction. Let us rst emphasize that the existence of some  2 R such that ( ) > 0 0 0 implies that a maximizer of  exists. Indeed, we have that lim () = 0 !1 since ? ? j()j  c (;  ) +  ;  (B.13) ` t+1 `=1 and  obeys the vanishing property (3.6) by hypothesis. Hence, for any 0 < " < ( ) there exists a compact set K such that 0 " c 0 8 2 K ; ()  " < sup ( ); (B.14) 2K because  is continuous. The extreme value theorem [73, Prop. A.8] then states that (at least) one maximizer of , say  , exists and  2 K . We note that m m " by de nition ( )  ( ) > 0. We show below that we also must necessarily m 0 have ( )  0. This leads to the desired contradiction and proves the result. Let  , j  j and assume without loss of generality that ` m : (B.15) 1 t ? t ? We have that  > 0 because  2= f g since ( )  0 8` 2 J1; tK (by as- 1 m ` `=1 ` sumption) and ( )  ( ) > 0. We next show that the working assumptions m 0 also imply ( )  0 by distinguishing between three cases: Case 1:    . From (B.15), we have: t+1 1 p p u u t+1 8u  0; max e  e : (B.16) 1`t Hence t+1 ( )  c 1 e d(u) < 0: m ` `=1 | {z }| {z } <0 by hyp. >0 Case 2:  >  . We rely on the following technical lemma that exploits t+1 t the notion of \sign changes of a nite sequence". This notion is de ned as the number of times two consecutive elements of the nite sequence have opposite signs. For instance, the sequence (1; 1;1; 1) has two sign changes (respectively at the third and fourth positions). Lemma 7. Let P (u) , c e be an exponential polynomial on R with ` + `=1 0 <  < : : : <  and fc g  R . Assume that: 1 k ` `=1 • the sequence c ; : : : ; c has at most two sign changes; 1 k • P (0) < 0 and lim P (u) = 0 . u!+1 35 Then there exists u > 0 for which the following inequality holds Z Z +1 +1 f (u)P (u) d(u)  f (u ) P (u) d(u); (B.17) 0 0 for any non-decreasing function f on R and any (unsigned) nite Borel mea- sure  on R such that the integrals converge. The proof of the lemma is postponed to Appendix C.2. ? t As mentioned previously, we have on the one hand that  2= f g . On ` `=1 ? ? the other hand,  6=  because j  j =  >  > 0. Therefore, m m t+1 1 t+1 t+1 ? t+1 ? t+1 2 R nf g . Since  is twice continuously di erentiable on R n f g , ` `=1 ` `=1 must necessarily verify the following second-order optimality condition: (2) ( )  0: (B.18) (2) We next show that C( ) <  ( ) for some positive constant C > 0. Hence, m m in view of (B.18), this leads to the desired contradiction: ( ) < 0. Assume rst that  <  <  (the equality cases will be addressed later). 1 t (2) (2) (2) From (B.10) we have that  =  +  with 1 2 p p (2) p2 p2 u u ` t+1 ( ) = p(1 p) u c  e  e d(u) m ` 1 t+1 `=1 p p (2) 2(p1) 2(p1) 2 2 u u ` t+1 ( ) = p u c  e  e d(u): m ` 2 t+1 `=1 Using  >  > 0 8` 2 J1; tK, we obtain t+1 ` t 2(1p) p  p p (2) t+1 2 u u ` t+1 ( ) = u c e e d(u) m ` 2(1p) t+1 `=1 2 +1 p p 2 u u ` t+1 > u c e e d(u): (B.19) 2(1p) t+1 `=1 Note now that: • the function u 7! u is increasing, P p t u ` t+1 • P (u) , c e e is an exponential polynomial with 0 < `=1 <  <  <  and whose sequence of coecients is (c ; : : : ; c ;1) 1 t t+1 1 t and has exactly one sign change. • As max  <  by hypothesis, we have P (u) > 0 for suciently 1`t ` t+1 large u so lim P (u) = 0 . u!+1 • Since c < 1 we have P (0) < 0. `=1 36 Therefore, Lemma 7 applies and there exists u > 0 such that 2 2 p p (2) 2 2 ( ) > u P (u) d(u) = u ( ): (B.20) m m 2 0 0 2(1p) 2(1p) t+1 t+1 (2) This establishes that  ( ) > C ( ) where C > 0 is a positive constant. m 2 m 2 (2) The same rationale leads to  ( ) > C ( ) with C  0 (C = 0 for m 1 m 1 1 (2) (2) (2) (2) p = 1 since  is identically zero). Since  =  +  , one obtains that 1 1 2 (2) ( ) > (C + C )( ), which concludes the proof for  <  <  . m 1 2 m 1 t Let us now come back to the general case where      . Denote 1 t 0 t ~ ~ by  < : : : <  , with t  t, the ordered distinct values in f g , and 1 t ` `=1 0 0 0 0 let  =  . Moreover, for any ` 2 J1; t K, let c ~ be equal to the sum t +1 t+1 ` 0 0 of the coecients c such that  =  , and let c ~ = 1. We note that, ` ` ` t +1 P 0 P t t (2) by de nition, c ~ = c < 1. We can then show that  ( ) > 0 ` ` m ` =1 `=1 ~ ~ C( ) with C > 0 by applying the same reasoning as above to  ; : : : ;  0 m 1 t +1 and c ~ ; : : : ; c ~ . 1 t +1 Case 3:  <    . There exists ` 2 J1; t 1K such that: 1 t+1 t 0 <  for `  ` (B.21a) ` t+1 0 for ` > ` . (B.21b) ` t+1 0 P P P t ` t 0 " " Denote " , 1 c > 0 and let s , c + and s , c + ` 1 ` 2 ` `=1 `=1 2 `=` +1 2 such that s + s = 1. One can write 1 2 p p ` t+1 ( ) = s e e d(u) m 1 `=1 | {z } ,  ( ) 1 m p p u u ` t+1 + s e e d(u) : (B.22) `=` +1 | {z } ,  ( ) 2 m 0 c Using (B.21a) and the fact that < 1, we have that  ( ) < 0 by 1 m `=1 resorting to the same reasoning as in Case 2. Similarly, using (B.21b) and the fact that < 1, we obtain that  ( ) < 0 by the same arguments 2 m `=` +1 0 s as in Case 1. Hence, we nally have ( )  s  ( ) + s  ( ) < 0, which m 1 1 m 2 2 m leads to the desired contradiction. B.3. Proof of Theorem 4 - Recovery in dimension D ? ? k ? Let S = f g and assume Grid(S ) is axis admissible. Let us rst ` `=1 observe that, by virtue of Corollary 1, the elements of fa() :  2 Grid(S )g are linearly independent. Let R , span(fa() :  2 Grid(S )g): (B.23) 37 If r 2 Rnf0 g, we note from Lemma 1 that the function f :  7! jha();rij admits at least one maximizer since r results from a (non trivial) nite linear combination of atoms. Moreover, the maximum of f must be strictly greater ? ? than zero. If not, ha();ri = 0 8 2 Grid(S ) and one deduces that r 2 R , the orthogonal to R. Since r 2 R, this leads to r = 0 which is in contradiction with our initial assumption \r 6= 0 ". We also have that the maximizers of f must belong to Grid(S ) as shown by the following arguments. If  is a maximizer of f , then t =  [d] is a maximizer of m m f : t 7! jha( + (t  [d])e );rij: (B.24) d m m d Denoting  ,   [d]e , we have  ? e by construction and f writes 0 m m d 0 d d f (t) = c ( + t e ;  ) : (B.25) d ` 0 d `=1 Because Grid(S ) is axis admissible with respect to  that f is not identically zero (since f ( [d]) = f ( ) > 0), the maximizers of f must belong to S , d m m d d ? k f [d]g . As a consequence, since this conclusion holds for any d 2 J1; DK, we ` `=1 have that  2 S = Grid(S ). Formulated in a slightly di erent way, we m d d=1 thus just proved that: ? 0 8r 2 Rnf0 g;8 2 nGrid(S ); max jha( );rij > jha();rij: (B.26) 0 ? 2Grid(S ) We are now ready to prove the statements of the theorem: ? ? ? Grid(S )-delayed recovery of any S  S . First, since S  Grid(S ), we have that y 2 R. Moreover, y 6= 0 since it results from a nontrivial linear com- bination of linearly independent atoms. Hence (B.26) holds and OMP selects a parameter in Grid(S ) at the rst iteration. Repeating the same argument at the next iterations, OMP selects parameters in Grid(S ) until the residual r vanishes. Now, because the atoms in fa() :  2 Grid(S )g are linearly in- dependent, r = 0 if and only if the set of parameters selected by OMP, say b b S , veri es S  S . Since OMP never selects twice the same parameter and ? ? S  Grid(S ), we thus achieve Grid(S )-delayed recovery of S . ? ? ? S -delayed recovery of any S  S . Since the elements of fa() :  2 Grid(S )g are linearly independent, (3.30-R-ERC) can equivalently be rewritten as (see e.g., [1, Prop. 3.15]): ? ? 0 8r 2 R nf0 g;8 2 Grid(S )nS ; maxjha( );rij > jha();rij; S H 0 ? 2S ? k where R , span(fa( )g ). Combining this result with (B.26) and using ` `=1 the fact that R ?  R lead to ? 0 8r 2 R ?nf0 g;8 2 nS ; maxjha( );rij > jha();rij: S H 0 ? 2S 38 Following the same arguments as above, we then have that: 1) OMP selects parameters in S until the residual r vanishes; 2) r = 0 if and only if the set of parameters selected by OMP veri es S  S . Since OMP never selects twice ? ? the same parameter and S  S , OMP thus achieves S -delayed recovery of S . Sharpness of the result. If (3.30-R-ERC) is not veri ed, we have from [1, Prop. 3.15] that there exists some y 2 R ?nf0 g such that S H bad max jha();y ij > maxjha();y ij: (B.27) bad bad ? ? ? 2Grid(S )nS 2S In other words, OMP with y as input selects some  2= S at the rst iteration. bad B.4. Proof of Lemma 4 D card(G) Let G = S = f g be an arbitrary Cartesian grid in R and d ` d=1 `=1 card(G) fc g  R be a set of card(G) coecients not all equal to 0. Consider `=1 d 2 J1; DK and  2 R such that  [d] = 0 and de ne 0 0 f : R ! R d + card(G) card(G) X X k +te  k 0 d ` t 7! c ( + te ;  ) = c e : (B.28) ` 0 d ` ` `=1 `=1 We assume that f is not identically zero as in the statement of De nition 7. f can be rewritten as card(G) p D p jt [d]j  j [j] [j]j ` 0 ` j=1;j6=d f (t) = c e : d ` `=1 card(G) Let q denote the number of distinct elements of f [d]g and suppose (up `=1 to some renumbering) that f [d]g are pairwise distinct. We note that q  1 `=1 because otherwise there is a contradiction with our hypothesis \f not identically zero". We can then rewrite f as jt [d]j f (t) = ec e (B.29) d ` `=1 jt [d]j where the terms proportional to e for (possibly) identical values of  [d] have been merged together, and the scalars ec take into account the constant terms in the exponentials that do not depend on t. 0 0 Let A = fa (t ) : t 2 Rg be a Generalized Laplace dictionary in dimension 1 (see De nition 5). Then, f (t) can also be interpreted as the inner product between atom a (t) 2 A and y , ec a ( [d]). Let S , f [d] : ce 6= 1 1 ` ` ` ` 1 `=1 1 0; ` = 1 : : : qg. Applying Theorem 3, we have that OMP with y as input 39 e e achieves exact card(S )-step recovery of S . In particular, this implies that OMP selects a parameter in S at the rst iteration, that is: 0 0 8t 2 RnS; f (t ) < max f (t): (B.30) d d t2S Hence, the maximizers of f belong to S  f [d]g  S . d ` d `=1 C. Miscellaneous C.1. Proof of Lemma 1 ? k Let r 2 H, k 2 N and assume that there exists k parameters f g ` `=1 k  ? and k nonzero coecients fc g  R such that r = c a( ). De ne ` ` `=1 `=1 ` function ' as ' :  ! R 7! jha();rij = c (;  ) ; (C.1) `=1 ` that is, the function involved in step 6 in Algorithm 1. We now prove that a maximizer of (C.1) exists. To that aim, we distinguish two cases. Case 1: 8 2 ; '() = 0. In that case, any parameter  2  is a maximizer of '. Case 2: 9 2 ; '( ) > 0. Denote " , '( ). We then have 0 0 0 sup '()  '( ) = " > 0: (C.2) Hence, by condition (3.6), there exists k compact sets fK g such that for all `=1 ` 2 J1; kK,  2 K and 0 ` c ? 8 2 K ; (;  ) < ": (C.3) ` ` jc j 0 ` ` =1 Note that the right-hand-side of (C.3) is well de ned: by positive-de niteness of h;i, r 6= 0 so k > 0 and jc j > 0 necessarily. De ne K = [ K . ` ` `=1 `=1 c k c Since K = \ K , we have using the triangular inequality `=1 ` c ? 8 2 K ; '()  jc j(;  ) < ": (C.4) `=1 See now that ' is continuous by continuity of , K is compact as a union of compact sets. Then, the extreme value theorem ensures that there exists such that '( )  '() for all  2 K . Lemma 1 follows by seeing that m m '( )  '() for all  by (C.4). 40 Case 1: contradiction Case 2 Case 3: contradiction u u u Figure C.2: Shape of P (see proof of Lemma 7) with constraints i) P is continuous, ii) P (u) < 0 and iii) 9u > 0 such that P (u) > 0 for all u > u . One see that the constraints 0 0 cannot be satis ed in cases 1 and 3. C.2. Proof of Lemma 7 The proof of Lemma 7 is based on the following result: Lemma 8 (Laguerre's generalization of Descartes's rule of signs [74, p. 319]). Let a ; : : : ; a be nonzero real coecients and 0 < x <  < x be real numbers. 1 k 1 k Let z be the number of real roots of the function P (u) = a x , and n be `=1 ` ` the number of changes in sign in the sequence of numbers a ; : : : ; a . Then 1 k z  n . The sequence of coecients a = c with ` 2 J1; kK has only two sign ` k+1` k+1` changes by hypothesis. By applying Lemma 8 with x = e , one sees that P has at most two real roots, so at most two sign changes on R . However, P must satisfy the following constraints: i) P is continuous on [0; +1[, ii) P (0) < 0, iii) there exists u > 0 such that P (u) > 0 for all u > u . 0 0 As illustrated in Figure C.2, these three constraints cannot be veri ed simulta- neously if P has 0 or 2 roots. Thus P has exactly one sign change on R and there exists u > 0 such that u < u =) P (u) < 0 and u > u =) P (u) > 0. 0 0 0 One then has, for any non-decreasing function f and any (non-negative) measure on R : f (u)P (u) d(u) Z Z u +1 = f (u) P (u) d(u) + f (u) P (u) d(u) |{z} |{z} |{z} |{z} 0 u non-decreasing 0 non-decreasing 0 Z Z u +1 f (u )P (u) d(u) + f (u )P (u) d(u) 0 0 0 u = f (u ) P (u) d(u): P(u) D. Details related to Example 4 Assume that ' is right di erentiable at 0. We rst prove by contradiction (1) (1) (1) that ' (0) < 0. Assume that ' (0) = 0. As ' is a CMF, ' (t)  0 for all (1) (1) t 2 R and ' is non-decreasing on R . It follows that ' is identically zero. + + Hence ' is constant and equal to '(0) = 1 which contradicts the assumption lim '(t) = 0. t!+1 Consider now the function de ned for all x  0 by f : x 7! (k 1)'(2x) k'(x) + 1. We note that f corresponds to the quantity involved in (3.24) with the substitution x $  . Since '(0) = 1, we have f (0) = 0. Moreover f is di erentiable for any x > 0 and (1) (1) (1) f (x) = 2(k 1)' (2x) k' (x) h i (1) ' (2x) (1) 1 = k' (x) 2 1 1 : (D.1) (1) ' (x) (1) ' (2x) Since ' is right di erentiable at 0, the ratio tends to 1 as x tends to 0 (1) ' (x) (1) 1 (remember that we proved that ' (0) < 0). Since k  3 we have 2 1 > 1, hence there exists x > 0 such that (1) ' (2x) x < x ) 2 1 1 > 0: (D.2) 0 (1) k ' (x) (1) (1) By Lemma 5, we have that ' (x) < 0 for all x > 0. Hence f (x) < 0 for x < x , that is f is decreasing on [0; x ]. Combining this result with f (0) = 0, 0 0 we deduce that (3.24) holds whenever  < x , that is the wrong parameter ? k 0 will be preferred to any of the f g . ` `=1 E. Exact recovery in higher dimensions - CMF kernel and k = 2 In this section, we elaborate on the notion of \axis admissibility" (see Def- inition 7) for general CMF kernels. We rst show (see Example 5) that there exist some Cartesian grids which are not axis admissible with respect to some CMF kernels. We then emphasize in the case k = 2 that the notion of \axis admissibility" is not necessary to achieve k-step recovery in CMF dictionaries. Example 5. Let A be a CMF dictionary with induced kernel  = '(kk ) 1+'( ) and consider  > 0, c = c = 1, c = c = and 1 2 3 4 p p '( )+'(2 ) 8 t 2 R ; f (t) = c (te ;  ) (E.1) 1 ` 1 ` `=1 where e the d-th canonical basis vector of R . 4 2 Let G , f g  R be a Cartesian grid with  = (0; 0) = 0 ,  = ` 1 2 2 `=1 (; 0) = e ,  = (0; ) = e ,  = (; ) = 1 . Simple algebraic 1 3 2 4 2 manipulations then show that f (0) = f () = 0 and 1 1 p p p '  + 1 + '( ) 1 p 2 f = 2'  1 : (E.2) 1 p 2 2 1 p p p '( ) + '(2 ) ' If ' and  are such that f 6= 0, one can conclude that the maximizers of f are distinct from 0 and . In view of De nition 7, this shows that G is not axis admissible with respect to A. For instance, this is the case for the CMF ' : x 7! (cf Example 2). 1+x Indeed, in this case (E.2) particularizes to 1 + 2 2 + f ( ) = 1 : (E.3) p p p 2  1+ 1 + 1 + 1 +  + p p p 2 1+2 2 As the factor inside the absolute value in the right-hand side is a non-zero rational function of x =  , we have f (=2) 6= 0 except possibly on a set of values of  which has Lebesgue measure equal to zero. Hence there exists  > 0 such that f (=2) > 0. We nally note that the construction presented here for the case D = 2 easily extends to D > 2 by zero-padding of the  's. ? ? We next show that k-step recovery of S with k = card(S ) = 2 may be possible in CMF dictionaries even when the axis admissibility assumption fails to hold. First, we state and prove a useful technical lemma: Lemma 9. Let  be a CMF kernel in dimension D in the sense of De nition 4. For any  ;  ;  2 R , the following result holds: 1 2 3 ( ;  ) ( ;  )  ( ;  ): (E.4) 1 2 2 3 1 3 Proof. By de nition, there exists a CMF ' such that (;) = ' kk and '(0) = 1. Since ' is nonnegative and decreasing, we have for all x; y  0 '(x + y)  '(x + y)'(x + y)  '(x)'(y): (E.5) p p Using this result with x = k  k and y = k  k , we have 1 2 2 3 p p p p ( ;  )( ;  )  ' k  k +k  k : (E.6) 1 2 2 3 1 2 2 3 p p p p Since the quasi-norm kk satis es a triangular inequality, we have k  k 1 3 p p p p k  k +k  k . As any CMF is decreasing, (E.4) follows. 1 2 2 3 p p We are now ready to state our recovery result: Lemma 10 (Exact recovery for CMF dictionaries when k = 2). Let A be a CMF dictionary in dimension D  1 with induced kernel . Consider a support ? ? ? ? ? 22 S = f ;  g where  6=  , and let G 2 R be the matrix de ned by 1 2 1 2 0 ? ? G[`; ` ] = ( ;  ). Assume that ` ` ? ? 1 8  2 Grid(S )nS ; G g < 1 (E.7) 2 ? where g 2 R is de ned by g [`] = (;  ) for ` = 1; 2. Then OMP achieves exact 2-step recovery of S . 43 Proof. By Lemma 6,  is admissible in the sense of De nition 1. We show below that since (E.7) holds, S is admissible with respect to  in the sense of De nition 2. Lemma 10 then follows from Theorem 2. Consider a non-empty subset of indices T  f1; 2g and t , card(T ). Let also fc g be such that c > 0 and c < 1. De ne ` `2T ` ` `2T : R ! R 7! c (;  ) (E.8) `2T We next show that items i) and ii) of De nition 2 are satis ed. Item i) of De nition 2. We distinguish two cases: • If t = 1, we can assume without loss of generality that T = f1g. Since ? ? ? (;  ) < 1 for all  6=  , one immediately sees that () = c (;  ) < 1 1 1 ? ? ? c = ( ) for all  6=  . Hence,  is the unique global maximizer of . 1 1 1 • If t = 2, let  be a maximizer of . We note that can also be written as () = jha();yij where y , c a( ) hence a maximizer always `=1 ` exists by virtue of Lemma 1. Since  maximizes the D-dimensional function , its d-th entry  [d] is a maximizer of the one-dimensional section of along the d-th canonical direction, denoted : : R ! R d + P P p p ? ? x 7! c ' jx  [d]j + j [j]  [j]j : (E.9) ` m `2T ` j6=d ` Applying the same reasoning as in the proof of Theorem 3 (see part of the proof dedicated to establishing \item i) of De nition 2"), we have (2) ? 2 8x 2= f [d]g : is twice di erentiable and (x) > 0. Hence, no ` `=1 ? 2 ? 2 x 2 f [d]g can be a maximizer and necessarily  [d] 2 f [d]g . ` `=1 ` `=1 Since this result is valid for all d 2 J1; DK, we nally have  2 Grid(S ). Therefore, since (E.7) holds, we have max () = max jha();yij ? ? ? ? 2Grid(S )nS 2Grid(S )nS (E.10) ? ? < maxjha( );yij = max ( ) ` ` ? ? ? ? 2S  2S Hence all maximizers of belong to S . Item ii) of De nition 2. From the working assumptions of item ii), the set T satis es T 6= ; and there exists ` 2 f1; 2gnT . Hence, we have T 6= f1; 2g, that is T is a singleton. We assume without loss of generality that T = f1g. ? ? ? ? Hence () = c (;  ) for some 0 < c < 1. If ( ) ( ;  )  0, then 1 1 1 1 1 1 2 ? ? ? ? ? ? ? c ( ;  ) = ( ) ( ;  )  0. Hence c  ( ;  ) and () 1 1 1 1 2 1 1 2 1 2 ? ? ? ? ? ( ;  )(;  ) for each . Using Lemma 9 with  = ,  =  ,  =  , we 1 2 3 1 2 1 1 2 obtain for each  2 : ? ? ? ? ? () (;  )  (;  )( ;  ) (;  )  0: 2 1 1 2 2 44 F. Table of notations Notation Comment General notations H, y (Hilbert) observation space and observation A;a() Dictionary A made of parametric atoms a Coherence between atoms of a support c 2 R Weighting coecients ;  Parameter set and element S , S Set of parameters G Cartesian grid k; ` Number of atoms, most frequent index Grid Set augmenter, see (3.27) ' CMF (see De nition 3) Kernel function   ! R K (D) Set of CMF kernels in dimension D CMF K (D) Set of Laplace kernels in dimension D Lap (n) f n-th derivative of function f e `-th element of the canonical basis E Expectation operator i Imaginary number Technical notations G, g Gram matrix related to a support S , columns of G g parametric vector related to a support S Vector of R for some k often de ned as u; v u; v = G g for some  2 Table F.1: Table of notations. 45 Acknowledgments Part of this work has been funded thanks to the Becose ANR project no. ANR-15-CE23-0021. References [1] S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sens- ing, Birkh auser Basel, 2013. doi:10.1007/978-0-8176-4948-7. [2] C. Ekanadham, D. Tranchina, E. P. Simoncelli, Recovery of sparse translation-invariant signals with continuous basis pursuit, IEEE Trans- actions on Signal Processing 59 (10) (2011) 4735{4744. doi:10.1109/TSP. 2011.2160058. [3] E. J. Cand es, C. Fernandez-Granda, Towards a Mathematical Theory of Super-resolution, Communications on Pure and Applied Mathematics 67 (6) (2014) 906{956. doi:10.1002/cpa.21455. [4] V. Duval, G. Peyr e, Exact support recovery for sparse spikes deconvolution, Foundations of Computational Mathematics 15 (5) (2014) 1315{1355. doi: 10.1007/s10208-014-9228-6. [5] Y. C. Pati, R. Rezaiifar, P. S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposi- tion, in: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, 1993, pp. 40{44 vol.1. doi:10.1109/ACSSC.1993.342465. [6] K. S. Miller, S. G. Samko, Completely monotonic functions, Integral Trans- forms and Special Functions 12 (4) (2001) 389{402. [7] B. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing 24 (2) (1995) 227{234. doi:10.1137/ S0097539792240406. [8] A. J. Miller, Subset selection in regression, Chapman and Hall, London, [9] S. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing 41 (12) (1993) 3397{3415. doi: 10.1109/78.258082. [10] R. A. DeVore, V. N. Temlyakov, Some remarks on greedy algorithms, Advances in Computational Mathematics 5 (1) (1996) 173{187. doi: 10.1007/bf02124742. [11] E. Liu, V. N. Temlyakov, The Orthogonal Super Greedy Algorithm and Applications in Compressed Sensing, IEEE Transactions on Information Theory 58 (4) (2012) 2040{2047. doi:10.1109/TIT.2011.2177632. 46 [12] S. Chen, S. A. Billings, W. Luo, Orthogonal least squares methods and their application to non-linear system identi cation, International Journal of Control 50 (5) (1989) 1873{1896. doi:10.1080/00207178908953472. [13] J. H. Friedman, W. Stuetzle, Projection pursuit regression, Journal of the American Statistical Association 76 (376) (1981) 817{823. doi:10.1080/ 01621459.1981.10477729. [14] P. J. Huber, Projection pursuit, The Annals of Statistics 13 (2) (1985) 435{475. doi:10.1214/aos/1176349519. [15] L. Rebollo-Neira, D. Lowe, Optimized orthogonal matching pursuit ap- proach, IEEE Signal Processing Letters 9 (4) (2002) 137{140. doi: 10.1109/LSP.2002.1001652. [16] V. N. Temlyakov, Greedy approximation, Acta Numerica 17 (2008) 235{ 409. doi:10.1017/s0962492906380014. [17] P. Vincent, Y. Bengio, Kernel Matching Pursuit, Machine Learning 48 (1/3) (2002) 165{187. doi:10.1023/a:1013955821559. [18] J. F. Claerbout, F. Muir, Robust modeling with erratic data, Geophysics 38 (5) (1973) 826{844. doi:10.1190/1.1440378. [19] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scienti c Computing 20 (1) (1998) 33{61. doi:10.1137/ S1064827596304010. [20] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological) 58 (1) (1996) 267{ URL http://www.jstor.org/stable/2346178 [21] R. Tibshirani, I. Johnstone, T. Hastie, B. Efron, Least angle regres- sion, The Annals of Statistics 32 (2) (2004) 407{499. doi:10.1214/ [22] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences 2 (1) (2009) 183{202. doi:10.1137/080716542. [23] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed opti- mization and statistical learning via the alternating direction method of multipliers, Foundations and Trends® in Machine Learning 3 (1) (2011) 1{122. doi:10.1561/2200000016. [24] R. Gribonval, M. Nielsen, Beyond sparsity: Recovering structured repre- sentations by $fnellg^1$ minimization and greedy algorithms, Advances in Computational Mathematics 28 (1) (2008) 23{41. doi:10.1007/ s10444-005-9009-5. 47 [25] L. Borup, R. Gribonval, M. Nielsen, Beyond coherence: Recovering struc- tured time{frequency representations, Applied and Computational Har- monic Analysis 24 (1) (2008) 120{128. doi:10.1016/j.acha.2007.09. [26] K. C. Knudson, J. Yates, A. Huk, J. W. Pillow, Inferring sparse represen- tations of continuous signals with continuous orthogonal matching pursuit, Advances in Neural Information Processing Systems 27 (2014) 1215{1223. [27] A. Eftekhari, M. B. Wakin, Greed is super: A fast algorithm for super- resolution (2015). arXiv:1511.03385. [28] C. Dor er, A. Dr emeau, C. Herzet, Ecient atom selection strategy for iterative sparse approximations, in: iTWIST 2018 - International Traveling Workshop on Interactions between low-complexity data models and Sensing Techniques, Marseille, France, 2018, pp. 1{3. URL https://hal.inria.fr/hal-01937501 [29] K. Bredies, H. K. Pikkarainen, Inverse problems in spaces of measures, ESAIM: Control, Optimisation and Calculus of Variations 19 (1) (2012) 190{218. doi:10.1051/cocv/2011205. [30] Y. de Castro, F. Gamboa, Exact reconstruction using Beurling minimal extrapolation, Journal of Mathematical Analysis and Applications 395 (1) (2012) 336 { 354. doi:10.1016/j.jmaa.2012.05.011. [31] G. Tang, B. N. Bhaskar, P. Shah, B. Recht, Compressed sensing o the grid, IEEE Transactions on Information Theory 59 (11) (2013) 7465{7490. doi:10.1109/tit.2013.2277451. [32] Y. D. Castro, F. Gamboa, D. Henrion, J.-B. Lasserre, Exact solutions to super resolution on semi-algebraic domains in higher dimensions, IEEE Transactions on Information Theory 63 (1) (2017) 621{630. doi:10.1109/ tit.2016.2619368. [33] N. Boyd, G. Schiebinger, B. Recht, The alternating descent conditional gra- dient method for sparse inverse problems, in: 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Process- ing (CAMSAP), 2015, pp. 57{60. doi:10.1109/CAMSAP.2015.7383735. [34] P. Catala, V. Duval, G. Peyr e, A low-rank approach to o -the-grid sparse deconvolution, SIAM Journal on Imaging Sciences 12 (3) (2019) 1464{1500. doi:10.1137/19M124071X. [35] Q. Denoyelle, V. Duval, G. Peyr e, E. Soubies, The sliding frank{wolfe algo- rithm and its application to super-resolution microscopy, Inverse Problems 36 (1) (2019) 014001. doi:10.1088/1361-6420/ab2a29. 48 [36] A. Eftekhari, A. Thompson, Sparse inverse problems over measures: Equiv- alence of the conditional gradient and exchange methods, SIAM Journal on Optimization 29 (2) (2019) 1329{1349. doi:10.1137/18m1183388. [37] A. Flinth, F. de Gournay, P. Weiss, On the linear convergence rates of exchange and continuous methods for total variation minimization (Jun 2019). arXiv:1906.09919. [38] L. Chizat, F. Bach, On the global convergence of gradient descent for over- parameterized models using optimal transport, in: Advances in Neural Information Processing Systems 31, 2018, pp. 3036{3046. [39] L. Chizat, Sparse Optimization on Measures with Over-parameterized Gra- dient Descent, working paper or preprint (July 2019). URL https://hal.archives-ouvertes.fr/hal-02190822 [40] G. de Prony, Essai experimental et analytique : sur les lois de la dilatabilite des uides elastique et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, a di erentes temperatures, 1795, journal de l'Ecole Polytechnique. [41] S. Kunis, T. Peter, T. R omer, U. von der Ohe, A multivariate generalization of Prony's method, Linear Algebra and its Applications 490 (2016) 31{47. doi:10.1016/j.laa.2015.10.023. [42] W. Liao, A. Fannjiang, MUSIC for single-snapshot spectral estimation: Stability and super-resolution, Applied and Computational Harmonic Anal- ysis 40 (1) (2016) 33{67. doi:10.1016/j.acha.2014.12.003. [43] R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques, IEEE Transactions on Acoustics, Speech, and Signal Processing 37 (7) (1989) 984{995. doi:10.1109/29.32276. [44] X. Wei, P. L. Dragotti, FRESH|FRI-based single-image super-resolution algorithm, IEEE Transactions on Image Processing 25 (8) (2016) 3723{ 3735. doi:10.1109/tip.2016.2563178. [45] J. A. Tropp, Greed is good: Algorithmic results for sparse approximation, IEEE Transactions on Information Theory 50 (10) (2004) 2231{2242. doi: 10.1109/TIT.2004.834793. [46] C. Soussen, R. Gribonval, J. Idier, C. Herzet, Joint k-Step Analysis of Orthogonal Matching Pursuit and Orthogonal Least Squares, IEEE Trans- actions on Information Theory 59 (5) (2013) 3158{3174. doi:10.1109/ tit.2013.2238606. [47] R. Gribonval, P. Vandergheynst, On the exponential convergence of match- ing pursuits in quasi-incoherent dictionaries, IEEE Transactions on Infor- mation Theory 52 (1) (2006) 255{261. doi:10.1109/tit.2005.860474. 49 [48] J.-J. Fuchs, On sparse representations in arbitrary redundant bases, IEEE Transactions on Information Theory 50 (6) (2004) 1341{1344. doi:10. 1109/TIT.2004.828141. [49] J. A. Tropp, Just relax: convex programming methods for identifying sparse signals in noise, IEEE Transactions on Information Theory 52 (3) (2006) 1030{1051. doi:10.1109/TIT.2005.864420. [50] C. Herzet, A. Dr emeau, C. Soussen, Relaxed Recovery Conditions for OMP/OLS by Exploiting Both Coherence and Decay, IEEE Transactions on Information Theory 62 (1) (2016) 459{470. doi:10.1109/TIT.2015. [51] S. Huang, J. Zhu, Recovery of sparse signals using OMP and its variants: convergence analysis based on RIP, Inverse Problems 27 (3) (2011) 035003. doi:10.1088/0266-5611/27/3/035003. [52] R. Maleh, Improved RIP Analysis of Orthogonal Matching Pursuit, Tech. rep. (2011). arXiv:1102.4311. [53] Q. Mo, Y. Shen, A Remark on the Restricted Isometry Property in Orthog- onal Matching Pursuit, IEEE Transactions on Information Theory 58 (6) (2012) 3654{3656. doi:10.1109/TIT.2012.2185923. [54] J. Wang, B. Shim, On the Recovery Limit of Sparse Signals Using Orthog- onal Matching Pursuit, IEEE Transactions on Signal Processing 60 (9) (2012) 4973{4976. doi:10.1109/TSP.2012.2203124. [55] L. Chang, J. Wu, An Improved RIP-Based Performance Guarantee for Sparse Signal Recovery via Orthogonal Matching Pursuit, IEEE Transac- tions on Information Theory 60 (9) (2014) 5702{5715. doi:10.1109/TIT. 2014.2338314. [56] J. Wen, X. Zhu, D. Li, Improved bounds on restricted isometry constant for orthogonal matching pursuit, Electronics Letters 49 (23) (2013) 1487{1489. doi:10.1049/el.2013.2222. [57] Q. Mo, A Sharp Restricted Isometry Constant Bound of Orthogonal Match- ing Pursuit, Tech. rep. (2015). arXiv:1501.01708. [58] W. Rudin, Principles of Mathematical Analysis, 3rd Edition, Inter- national Series in Pure and Applied Mathematics, McGraw-Hill Sci- ence/Engineering/Math, 1976. [59] D. D. Carlo, C. Elvira, A. Deleforge, N. Bertin, R. Gribonval, Blaster: An o -grid method for blind and regularized acoustic echoes retrieval, in: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 156{160. 50 [60] J.-M. Aza s, Y. de Castro, F. Gamboa, Spike detection from inaccurate samplings, Applied and Computational Harmonic Analysis 38 (2) (2015) 177{195. doi:10.1016/j.acha.2014.03.004. [61] C. Fernandez-Granda, Super-resolution of point sources via convex pro- gramming, Information and Inference: A Journal of the IMA 5 (3) (2016) 251{303. doi:10.1093/imaiai/iaw005. [62] M. Kre n, A. Nudel0man, The Markov Moment Problem and Extremal Problems, American Mathematical Society, 1977. doi:10.1090/mmono/ [63] Q. Denoyelle, V. Duval, G. Peyr e, Support Recovery for Sparse Super- Resolution of Positive Measures, Journal of Fourier Analysis and Applica- tions 23 (5) (2017) 1153{1194. [64] C. Poon, G. Peyr e, MultiDimensional sparse super-resolution, SIAM Journal on Mathematical Analysis 51 (1) (2019) 1{44. doi:10.1137/ 17m1147822. [65] C. J. Hillar, L.-H. Lim, Most tensor problems are np-hard, J. ACM 60 (6). doi:10.1145/2512329. URL https://doi.org/10.1145/2512329 [66] C. Elvira, J. E. Cohen, C. Herzet, R. Gribonval, Continuous dictionaries meet low-rank tensor approximations, in: iTwist 2020 - International Trav- eling Workshop on Interactions between low-complexity data models and Sensing Techniques, Nantes, France, 2020, pp. 1{3. URL https://hal.archives-ouvertes.fr/hal-02567115 [67] V. Chandrasekaran, B. Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Foundations of Computational Math- ematics 12 (6) (2012) 805{849. doi:10.1007/s10208-012-9135-7. URL https://doi.org/10.1007/s10208-012-9135-7 [68] H. Wendland, Scattered data approximation, 2005. doi:10.2277/ [69] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, USA, 2004. [70] D. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas with Ap- plication to Linear Systems Theory, Princeton University Press, 2005. [71] G. P olya, Remarks on Characteristic Functions, in: Proceedings of the [First] Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, Calif., 1949, pp. 115{123. URL https://projecteuclid.org/euclid.bsmsp/1166219202 51 [72] D. V. Widder, Laplace Transform, hardcover Edition, Princeton Mathe- matical Press, 1941. [73] D. Bertsekas, Nonlinear Programming, 2nd Edition, Athena Scienti c, [74] H. Fejzi c, C. Freiling, D. Rinne, Descartes' rule of signs, alternations of data sets, and balanced di erences, The American Mathematical Monthly 116 (4) (2009) 316{327. URL http://www.jstor.org/stable/40391091 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Mathematics arXiv (Cornell University)

When does OMP achieve exact recovery with continuous dictionaries?

Loading next page...
 
/lp/arxiv-cornell-university/when-does-omp-achieve-exact-recovery-with-continuous-dictionaries-TCADLTQD6F
ISSN
1063-5203
eISSN
ARCH-3343
DOI
10.1016/j.acha.2020.12.002
Publisher site
See Article on Publisher Site

Abstract

This paper presents new theoretical results on sparse recovery guarantees for a greedy algorithm, Orthogonal Matching Pursuit (OMP), in the context of continuous parametric dictionaries. Here, the continuous setting means that the dictionary is made up of an in nite uncountable number of atoms. In this work, we rely on the Hilbert structure of the observation space to express our recovery results as a property of the kernel de ned by the inner product between two atoms. Using a continuous extension of Tropp's Exact Recovery Condition, we identify key assumptions allowing to analyze OMP in the continuous setting. Under these assumptions, OMP unambiguously identi es in exactly k steps the atom parameters from any observed linear combination of k atoms. These parameters play the role of the so-called support of a sparse representation in traditional sparse recovery. In our paper, any kernel and set of parameters that satisfy these conditions are said to be admissible. In the one-dimensional setting, we exhibit a family of kernels relying on completely monotone functions for which admissibility holds for any set of atom parameters. For higher dimensional parameter spaces, the analysis turns out to be more subtle. An additional assumption, so-called axis admissibility, is imposed to ensure a form of delayed recovery (in at most k steps, where D is the dimension of the parameter space). Furthermore, guarantees for recovery in exactly k steps are derived under an additional algebraic condition involv- ing a nite subset of atoms (built as an extension of the set of atoms to be recovered). We show that the latter technical conditions simplify in the case of Laplacian kernels, allowing us to derive simple conditions for k-step exact recovery, and to carry out a coherence-based analysis in terms of a minimum separation assumption between the atoms to be recovered. Keywords: sparse representation, continuous dictionaries, Orthogonal Matching Pursuit, exact recovery Preprint submitted to Elsevier June 23, 2020 arXiv:1904.06311v3 [cs.IT] 22 Jun 2020 1. Introduction Finding a sparse signal representation is a fundamental problem in signal processing. It consists in decomposing a signal y belonging to some vector space H as the linear combination of a few elements of some set A  H, that is y = c a where c 2 R , a 2 A. (1.1) ` ` ` ` `=1 Sparsity refers to the fact that the number of elements involved in the decom- position (1.1) should be much smaller than the ambient dimension, i.e., the dimension of H. The set A is commonly referred to as a dictionary and its ele- ments as atoms. In the sequel, we will assume that A is a parametric dictionary de ned as: A = fa() :  2 g (1.2) where  = R and a :  ! H is some continuous and injective function. In ? k this setup, (1.1) implies that there exist k parameters f g such that y can ` `=1 ? k be expressed as a linear combination of the atoms fa( )g . ` `=1 Over the past decade, sparse representations have proven to be of great interest in many applicative domains. As a consequence, numerous practical procedures, along with their theoretical analyses, have been proposed in the literature. Most contributions addressed the sparse-representation problem in the \discrete " setting, where the dictionary contains a nite number of elements, see [1]. Recently, several works tackled the problem of sparse representations in \continuous " dictionaries, whereA is made up of an in nite uncountable number of atoms but a :  ! H enjoys some continuity property, see e.g., [2{4]. We review the contributions most related to the present work in Section 2. Before dwelling over the state of the art, we brie y describe the scope of our paper. In this work, we focus on the continuous setting and assume that H is a Hilbert space with inner product h;i and induced norm kk. We de- rive exact recovery conditions for \Orthogonal Matching Pursuit" (OMP) [5], a natural adaptation to the continuous setting of a popular greedy procedure of the literature (see Algorithm 1). The main question addressed in this paper is ? k as follows. Let f g be k pairwise distinct elements of  and assume that ` `=1 ? k y obeys (1.1) with a = a( ) for some fc g  R . Under which conditions ` ` `=1 does OMP achieve exact recovery (that is, correct unambiguous identi cation) ? k k of the parameters f g and the coecients fc g ? In particular, is exact ` `=1 `=1 recovery possible in k steps? This is of course only possible if the preimage of an atom a = a( ) is unique, hence the assumption that a() is injective. ` ` We note that, in the context of continuous dictionaries, the fact that OMP could correctly identify a set of k atoms in exactly k iterations may seem sur- prising in itself. Indeed, inspecting Algorithm 1, we see that this implies that OMP must identify one correct atom at each iteration t of the algorithm, that ? k is  2 f g 8t 2 J1; kK. The following simple example suggests that such a ` `=1 requirement may never be met for continuous dictionaries: 2 Algorithm 1: Orthogonal Matching Pursuit (OMP) Input: observation y 2 H, normalized dictionary A = fa() :  2 g. 1 r y // residual vector 2 S ; // estimated support 3 t 0 ; 4 while r 6= 0 do 5 t t + 1 ; 6  2 arg max jha();rij // atom selection b b b 7 S S [f g // support update 8 (bc ; : : : ;bc ) arg min y c a  // least-squares update 1 t ` ` (c ;:::;c )2R 1 t `=1 9 r y bc a  // residual vector ` ` `=1 10 end 11 k = t ; b b k k Output: estimated support S = f g and coecients fbc g . ` ` `=1 `=1 Example 1 (The Gaussian deconvolution problem). Consider  = R and let H = L (R) be the space of square integrable functions on R. Assume a() is de ned as a : R ! L (R) (1.3) 1 1 2 () 4 2 7!  e : Suppose y results from the positive linear combination of k = 2 distinct ? ? ? ? atoms, that is y = c a( ) + c a( ),  6=  , c > 0, c > 0. Then, even 1 2 1 2 1 2 1 2 ? ? in this very simple case, OMP never selects an atom in f ;  g at the rst 1 2 iteration. Indeed, particularizing step 6 of Algorithm 1 to the present setup, we have that, at the rst iteration, OMP will select the parameter  maximizing ? 2 ? 2 1 1 ( ) ( ) 1 2 4 4 jha();yij = c e + c e : (1.4) 1 2 Now, since the right-hand side of (1.4) is continuously di erentiable, rst-order optimality conditions tell us that any maximizer of  7! jha();yij must satisfy 1 ? 2 1 ? 2 ? ( ) ? ( ) 4 1 4 2 (  )c e + (  )c e = 0: (1.5) 1 2 1 2 ? ? ? Since  6=  , c 6= 0, c 6= 0, this equality cannot be veri ed by either  = 1 2 1 2 1 ? ? ? or  =  . As a consequence, OMP necessarily selects some  2= f ;  g. 2 1 2 Nevertheless, we show in this paper that exact recovery in k steps is pos- sible with OMP for some particular families of dictionaries A. Our recovery 3 0 conditions are expressed in terms of the kernel function (;  ) associated to the inner product between two atoms, i.e., 0 0 (;  ) , ha();a( )i: (1.6) 0 ? k We show that if the kernel  ;  and the atom parameters f g verify ` `=1 some particular conditions (see Section 3.2), then exact recovery in k steps is possible with OMP. We emphasize moreover that these conditions are satis ed for a family of kernels of the form: 0 0 (;  ) = ' k  k 0 < p  1; (1.7) where kk is the ` quasi-norm (norm for p = 1) and ' is a completely monotone function (see De nition 3). This family encompasses the well-known Laplace kernel [6]. Hereafter, we will refer to kernels taking the form (1.7) as \CMF kernels". A rst (perhaps surprising) outcome of our analysis is as follows. If  = R and the dictionary is de ned by a CMF kernel (1.7), OMP correctly identi es ? k k any pairwise distinct atom parameters f g   and coecients fc g ` `=1 `=1 R in exactly k iterations for any k 2 N (see Theorem 3). We emphasize that no ? k separation (i.e., minimal distance between parameters f g ) is needed. To ` `=1 our knowledge, this is the rst recovery of this kind in continuous dictionaries when no sign constraint is imposed on the coecients. It turns out that this \universal" exact recovery result is valid for very particular families of dictio- naries: CMF kernels exhibit a discontinuity in their derivatives (e.g., the partial derivative of  with respect to  when  =  ) and the space H in which the cor- responding dictionary lives is necessarily in nite-dimensional (see Section 3.3). When  = R with D > 1 and the dictionary is de ned by a CMF kernel (1.7), we show that such an exact recovery result no longer holds (see Exam- ple 4). Nevertheless, for dictionaries based on CMF kernels, under an additional hypothesis (referred to as \axis admissibility ", see De nition 7), we demonstrate that a form of delayed exact recovery (that is, in more than k iterations) holds. The number of iterations sucient to identify a set of k parameters is then upper-bounded by k (see Theorem 4). Moreover, under the above-mentioned hypothesis of axis admissibility, sucient and necessary conditions for exact 1 ? k recovery of a given subset f g in k steps (irrespective of the choice of the ` `=1 coecients fc g ) can be written in terms of a nite number of atoms of the `=1 D ? k dictionary (smaller than k ) including f g (see Theorem 4). We leverage ` `=1 this result to prove that exact recovery in k steps is possible as soon as the ? k elements of the subset f g obey some \minimum separation" condition (see ` `=1 Theorem 5). The rest of this paper is organized as follows. Section 2 draws connections with the sparse recovery literature. Section 3.1 elaborates on the main ingre- dients of the \continuous" setup and de nes the notions of recovery that are 1 ? k Here and in the sequel, when referring to a subset f g , we implicitly assume that the ` `=1 elements  are pairwise distinct. 4 used in the statements of our results. In Section 3.2, we exhibit a sucient condition on atom parameters and kernel such that exact recovery of a given set of atom parameters holds. We then present the family of CMF dictionaries in Section 3.3 and show in Section 3.4 that di erent forms of recovery can be achieved in these dictionaries. Concluding remarks are given in Section 4. The technical details of our results are contained in the appendices of the paper. The proofs of our main recovery results are exposed in Appendices A and B. Ap- pendix C contains some auxiliary technical details. Finally, Appendices D and E are dedicated to some mathematical developments related to two examples discussed in the paper. Notations The following notations will be used in this paper. The symbols R;R ;R ;R refer to the set of real, non-zero, non-negative and positive numbers, respec- tively. Boldface lower and upper cases (e.g., g, G) are used to denote ( nite- dimensional) vectors and matrices, respectively. The notation [i] refers to the ith element of a vector, and [i; j] for the element at the i-th row and j-th column of a matrix. Italic boldface letters (e.g., y or a) denote elements of a Hilbert space H. All-one and all-zero column vectors in R are denoted 1 and 0 , re- k k spectively. The `-th vector of the canonical basis in R will be denoted e . The notations h;i and kk refer to the inner product and its induced norm on H, while kk with p > 0 refers to the classical ` (pseudo-) norm on R . Finally, calligraphic letters (e.g., S;G) are used to describe nite subsets of the param- eter space , while Jm; nK denotes the set of integers i such that m  i  n. Given S  , we let S , nS be the complementary set of S in . The cardinality of a set is denoted card(). Finally, if ' : R 7! R is a function, (n) the notation ' refers to its n-th derivative. The main notations used in this paper are summarized in Appendix F. 2. Related works and state of the art Over the last decade, sparse representations have sparked a surge of in- terest in the signal processing, statistics, and machine learning communities. A question of broad interest which has been addressed by many scientists is the identi cation of the \sparsest" representation of an input signal y (that is, the representation involving the smallest number of elements of A). Since this problem has been shown to be NP-hard [7], many sub-optimal procedures have been proposed to approximate its solution. Among the most popular, one can mention methodologies based on convex relaxation and greedy algorithms. The term \sub-optimal" has to be understood in the following sense: these procedures are heuristics that only nd the sparsest solution of the input vector y under some restricted conditions. They can fail when these conditions do not hold. 5 Greedy procedures have a long history in the signal processing and statistical literature, which can be traced back to (at least) the 60's [8]. In the signal processing community, the most popular instances of greedy algorithms are known under the names of Matching Pursuit (MP) [9], Orthogonal Matching Pursuit (OMP) [5] (also known as Orthogonal Greedy Algorithm (OGA) [10, 11]) and Orthogonal Least Squares (OLS) [12]. Although these algorithms were already known under di erent names in other communities [13], they have been \rediscovered" many times, see e.g., [14{16]. Extensions to more general cost functions and kernel dictionaries are discussed in [17]. Sparse representations based on the resolution of convex optimization prob- lems were initially proposed in geophysics [18] for seismic exploration. These methods have been popularized in the signal processing community by the sem- inal work by Chen et al. [19] and by Tibshirani in Statistics [20]. Well-known instances of convex-relaxation approaches for sparse representations are Basis Pursuit (BP) [19] and Lasso [20], also known as Basis Pursuit Denoising, which correspond to di erent convex optimization formulations. Many algorithmic solutions to eciently address these problems have been proposed, see e.g., [21{23]. All the early contributions mentioned above have been made in the discrete setting, where the dictionary contains a nite number of atoms. Although Mallat and Zhang [9] already de ned MP for continuous dictionaries, the wide practice of MP is in the discrete setting. Greedy sparse approximation in the context of dictionaries made up of an in nite (possibly uncountable) number of atoms has only been studied more recently [16, 24, 25]. Practical procedures to implement greedy procedures in continuous dictionaries can be found in [26{28]. On the side of convex relaxation approaches, it was shown that a continu- ous version of Lasso can be expressed as a convex optimization problem over the space of Radon measures [29] and later referred to as the Beurling Lasso (BLasso) [30]. A continuous version of BP was also proposed [3] for speci c continuous dictionaries by exploiting similar ingredients. Motivated by an in- creasing demand in ecient solvers, di erent strategies to nd the solution of this problem (to some accuracy) were proposed over the past few years. When dealing with dictionaries made up of complex exponentials that depend on a one-dimensional parameter (that is D = 1), the Blasso problem can be reformu- lated as a semide nite program (SDP) [3, 31]. These methods have been further extended to the multidimensional case by considering SDP approximations of the problem [32]. The conditional gradient method (CGM) has also proven to be applicable to address the BLasso problem [29] and further enhanced with non- convex local optimization extra steps [33{35]. Interestingly, the CGM has been shown to be equivalent to the so-called exchange method in [36, 37]. More re- cently, gradient- ow methods on spaces of measures have also been investigated to address the BLasso problem [38, 39]. Finally, we also mention the existence of a vast literature on non-convex and non-variational procedures leveraging the celebrated Prony's method [40]. Among others, one may cite its extension to the multivariate case [41], the MUSIC [42] and ESPRIT [43] frameworks, as well as nite rate of innovation methods [44]. 6 Because (most of ) the approaches mentioned above (both in the discrete and continuous settings) are heuristics looking for the sparsest representation of some y, many theoretical works have been carried out to analyze their per- formance. Hereafter, we review the contributions of the literature most related to the present work. In particular, we focus on the contributions dealing with ? k k exact recovery of some subset f g for any choice of the coecients fc g ` `=1 `=1 (sometimes assuming some speci c sign patterns). In our discussion, we will use ? ? k the short-hand notation S = f g and refer to the latter as \support". Since ` `=1 ? k we always implicitly assume that the parameters f g are pairwise distinct, ` `=1 we have card(S ) = k. The presentation is organized in two parts, dealing re- spectively with the discrete and the continuous cases. In the discrete setting, we restrict our attention to contributions addressing the performance of MP, OMP and OLS, i.e., the greedy procedures the most connected to the framework of this paper. In the continuous setting, recovery analysis, including stability and robustness to noise, have only been addressed for convex-relaxation approaches. We review these conditions below and draw some similarities and di erences with the guarantees derived for OMP. 2.1. Discrete setting The discrete setting refers to the case where the dictionary contains a nite number of elements, that is card(A) < 1. Hereafter, we will restrict our discus- sion to parametric dictionaries of the form (1.2) since they are the main focus of this paper. In this context, the discrete setting refers to card() < +1. Exact Recovery Condition. The rst thorough analysis of OMP exact \k-step" ? ? k recovery of some S , f g is due to Tropp in [45]. Introducing the notations ` `=1 0 ? ? G[`; ` ] , ( ;  ); ` ` (2.1) g [`] , (;  ); Tropp's result can be rephrased as follows: ? ? k Theorem 1 (Tropp's ERC). Consider S = f g and assume that the atoms ` `=1 ? k fa( )g are linearly independent. If ` `=1 ? 1 8 2 nS ; G g < 1; (2:2 ERC) ? ? then OMP with y = c a( ) as input unambiguously identi es S and `=1 ` k k fc g in k iterations for any choice of the coecients fc g  R . Con- ` ` `=1 `=1 versely, if (2:2 ERC) is not satis ed, there exist not all-zero coecients fc g `=1 ? ? such that OMP with y = c a( ) as input selects some  2= S at the rst `=1 ` iteration. A proof of the direct part of this result can be found in [45, Th. 3.1]. The converse part is a slight variation of Tropp's original statement [45, Th. 3.10] and a proof can be found in e.g., [1, Prop. 3.15]. 7 Condition (2:2 ERC) is usually referred to as the \Exact Recovery Condi- tion" in the literature, and simply denoted ERC. Assuming linear independence ? k of the atoms fa( )g , it can be reformulated in the following (and perhaps ` `=1 more interpretable) way: ? 0 8r 2 R ? nf0 g; 8 2 nS ; jha();rij < maxjha( );rij (2.3) S H 0 ? 2S ? k where R ? , span(fa( )g ). In other words, it implies that OMP always ` `=1 selects a parameter in S during the rst k iterations for any input vector y ? k resulting from the linear combination of the atoms fa( )g . The converse ` `=1 part shows that (2:2 ERC) is worst-case necessary in the following sense: if (2:2 ERC) is not satis ed, there exists (at least) one non-trivial linear com- ? k ? bination of the atoms fa( )g such that OMP selects an element  2= S at ` `=1 the rst iteration; in this case the correct identi cation of S in k iterations is obviously not possible. Interestingly, condition (2:2 ERC) is also related (along with the linear in- ? k dependence of the atoms fa( )g ) to the success of MP, OLS and some con- ` `=1 vex relaxation procedures. In [46, Th. 2], the authors showed that (2:2 ERC) is also necessary and sucient for exact k-step recovery of S by OLS. Regard- ing MP, (2:2 ERC) ensures that the procedure only selects atoms in S but does not imply exact recovery after k iterations of the algorithm since the same atom can be selected many times (the least-squares update of the coecients in Algorithm 1 is not carried out), see e.g., [47, Th. 1]. Finally, in [48, Th. 3] [49, Th. 8], the authors show that (2:2 ERC) also ensures correct identi cation of S by some convex relaxation procedures as e.g., BP or Lasso. Coherence. Tropp's condition is of limited practical interest to characterize the recovery of all supports of size k since it requires to verify that (2:2 ERC) ? ? holds for any S with card(S ) = k. In order to circumvent this issue, other sucient conditions of success, weaker but easier to evaluate in practice, have been proposed in the literature. One of the most popular conditions is based on the coherence  of a normalized dictionary. Assuming the atoms of the dictionary are of unit norm, this condition writes (with the convention that = +1): 1 1 k < 1 + (2.4) where , sup j(;  )j: (2.5) ; 2 6= Condition (2.4), together with the normalization of the dictionary, implies that ? ? (2:2 ERC) is veri ed for any S with card(S )  k, and also implies the linear independence of any group of k atoms of the dictionary. It therefore implies that ? ? ? OMP and OLS correctly identify any S with card(S )  k in exactly card(S ) ? ? iterations. It also ensures the correct identi cation of any S with card(S )  k 8 by BP and Lasso. In [50], the authors emphasize that condition (2.4) can be slightly relaxed if the coecients fc g exhibit some decay. `=1 The coherence of the dictionary can be seen as a particular measure of \prox- imity" between the atoms of the dictionary. Other exact recovery conditions, based on di erent proximity measures, have been proposed in the literature. In [45, Th. 3.5], the author derived recovery conditions based on \cumulative coherence", whereas in [11, 51{57], guarantees based on \restricted isometry constants" were proposed. Given that Tropp's condition is both necessary and sucient, all such recovery conditions imply that the ERC holds for any support of size k. 2.2. Continuous setting General setup. Sparse representations in continuous dictionaries are basically characterized by two main ingredients: i) a parameter set , usually assumed to be a connected subset of R with non-empty interior or a torus in dimension D. We note that, in this paper, we restrict our attention to the case where  = R for D  1. ii) an \atom" function a :  ! H, assumed to be continuous and injective. This type of dictionary appears in numerous signal processing tasks such as sparse spike deconvolution or super-resolution where one aims to recover ne- scale details from an under-resolved input signal [3, 18, 35, 59]. Irrelevance of existing analyses. The continuity of a() does not allow most of the analyses performed in the context of discrete dictionaries to be extended to the continuous framework. In particular, all exact recovery conditions based on coherence or restricted isometry constants turn out to be violated whenever dealing with continuous dictionaries. As for the coherence condition (2.4), it is easy to see that the continuity of a() implies the continuity of (;) with respect to both its arguments. This, in turn, implies that  = 1 (for normalized atoms) and the coherence-based condition (2.4) is never met, even for k = 1! In order to circumvent this issue, some speci c exact recovery conditions for continuous dictionaries have been proposed in the literature, see e.g., [3, 4, 30, 60]. We review below the main ingredients grounding these conditions of recovery. In the context of convex-relaxation approaches, these conditions originate from the analysis of the associated optimality conditions. A separation condition for BP for continuous dictionaries. In the context of BP for continuous dictionaries, the question of exact recovery can be rephrased as ? k ? follows: if the atomsfa( )g are linearly independent and y = c a( ), ` `=1 `=1 ` is the solution of BP for continuous dictionaries unique and equal to a discrete = 0 if all the atoms are pairwise orthogonal and  ' 1 if some atoms are very correlated. These two assumptions implies that  is uncountable [58, Ch. 1, Exercise 19d]. 9 ? k k measure supported on f g with weights fc g ? ` `=1 `=1 The case where each atom is a collection of Fourier coecients (H = C ) has received a lot of attention due to its connection with the super-resolution prob- lem. Indeed, the latter scenario is equivalent to recovering in nitely resolved details (the parameters) from some low-pass observation. Without further as- sumptions on the coecients, the targeted measure is the unique solution of BP for continuous dictionaries provided that [3, Th. 1.2 and 1.3] ? ? min j 0  j > ; (2.6) ` ` `;` 2J1;kK f `6=` where jj is the ` distance on the D-dimensional torus (maximum deviation in any coordinate), C is a constant that depends on the parameter dimension D and f is the cut-o frequency of the observation low-pass lter. A framing of the value of C has been proposed by the same authors, further re ned in [4, Cor. 1] and [61, Th. 2.2]. ? k k We see that (2.6) implies the recovery of f g and fc g provided ` `=1 `=1 ? k that the elements of f g verify some \minimum separation" condition. In- ` `=1 terestingly, as shown in [30, Th. 2.1], this separation condition is no longer needed when dealing with positive linear combination of atoms (that is, when all the coecients fc g are positive). The authors showed moreover that this `=1 separation-free result for positive linear combinations holds for any dictionary such that the atom function a forms a \Chebyshev system" [62, Ch. 2] and provided that 2k + 1 observations are available (i.e., dim(H)  2k + 1). Dual certi cates for the BLasso problem. In [4], the authors derived several dual certi cates for the BLasso problem generalizing the work done by Fuchs for the Lasso [48] to an in nite-dimensional setup. They rst show that the existence of a \vanishing derivative pre-certi cate" [4, Def. 6] is necessary so that the support ? k of the solution to the BLasso problem is exactly f g [4, Prop. 8]. On the ` `=1 other hand, they show that a so-called \non degenerate source condition" [4, Def. 5] is sucient to ensure the desired recovery [4, Th. 2]. We note that these two conditions apply on dictionaries made up of di erentiable atom functions a :  7! H since they involve the rst and second order derivatives of the inner ? k product ha();yi evaluated at f g . This is in contrast with the \CMF" ` `=1 dictionaries considered in this paper which involve some non-di erentiability in their kernel function, see Section 3.3. Moreover, we also emphasize that both conditions involve the sign of the coecients fc g . `=1 The comparison with the discrete case goes even deeper: it can be shown that the solution of BP for continuous dictionaries is, in some sense, the limit of the solution of BLasso [4, Prop. 1]. Although out of scope of the present paper, we also mention the existence of a literature related to the robustness of BLasso in various noisy settings [4, 63, 64]. 10 3. Main results In this section, we present the main results of the paper. In Section 3.1, we describe the constitutive ingredients of the \continuous" setup addressed in this work and provide a rigorous de nition of the notions of exact recovery that will be used in our statements. Our main results are presented in Sections 3.2 and 3.4. The family of \CMF dictionaries", central to our results in Section 3.4, is introduced in Section 3.3. 3.1. Main ingredients We rst present the three main properties that a \continuous" dictionary should verify, see (3.4a), (3.4b) and (3.6) below. We then elaborate on some di erences between the implementation of OMP in the discrete and continuous settings. We nally give a precise de nition of the notions of recovery that will be used in the statements of our results. Continuous dictionary. First, the space  is usually assumed to be a connected metric space or a torus in dimension D. Hereafter, for the sake of conciseness, we will restrict our attention to the case where  = R and assume that the kernel associated to the atoms obeys some vanishing property (see (3.6) below). A second common working hypothesis in the \continuous" setup is the continuity of function a :  ! H, that is lim ka( ) a()k = 0 8 2 : (3.1) In this paper, we will moreover suppose that the atoms of the dictionary are normalized: ka()k = 1 8 2 : (3.2) In the sequel, recovery conditions will be expressed as a function of the induced kernel  ;  : 0 0 0 (;  ) , ha();a( )i 8;  2 : (3.3) The \continuity " and \unit-norm " properties are equivalent to: \unit norm " :  ;  = 1 8 2 ; (3.4a) \continuity " : lim  ;  = 1 8 2 : (3.4b) Moreover, we have from the Cauchy-Schwarz inequality that 0 0 ;   1; 8;  2 : (3.5) Our results can be adapted to any set  such that step 6 of Algorithm 1 is well-posed, that is, at least one maximizer exists. 11 Lastly, in this work, we will restrict our attention to kernels that vanish at in nity, i.e., 8 " > 0;8 2  ; 9K compact: sup ( ; ) < "; (3.6) 0 c 2K where K is the complement of K in . This covers the case where  is compact, by simply considering K =  with the convention sup 0  = 0. 2; OMP in continuous dictionaries. Although Algorithm 1 corresponds to the standard de nition of OMP in the discrete setting, its implementation in con- tinuous dictionaries leads to two major di erences. First, the \atom selection" step in Line 6 does not necessarily admit a maximizer. In such a case, the recur- sions de ned in Algorithm 1 are ill-posed since the procedure cannot elucidate the maximization problem in Line 6. Second, even if a maximizer exists, solving the \atom selection" problem may be computationally intractable. In particu- lar, the function to be maximized in Line 6 may have many local maxima and the problem is indeed NP-hard in certain cases (e.g., when the maximization step involves a rank-1 approximation of a tensor [65, Th. 1.13] as in [66]), while in other cases it is easy (for example, the SVD can be revisited in this frame- work and solved up to numerical precision [67, Sec. 2.2]). The maximization problem of Line 6 also appears in Frank-Wolfe type algorithms [35], where ex- isting theoretical guarantees also hold under the hypothesis that this step can be solved. In this paper, for the sake of simplifying our theoretical analysis, we will nev- ertheless stick to the idealistic version of OMP described in Algorithm 1. The results presented in this work should therefore be considered more for the theo- retical insights they provide into the behavior of OMP in continuous dictionaries than for their practical implications. In our theoretical analysis we will only have to deal with residuals r which can be written as a linear combination of a nite number of atoms of A. In such a case, the following lemma shows that a maximizer to the \atom selection" problem always exists: Lemma 1. Let A = fa() :  2 g be a continuous dictionary with kernel 0 0 (;  ) , ha();a( )i verifying the continuity property (3.4b) and vanishing property (3.6). Then arg maxjha();rij 6= ; (3.7) whenever r 2 Hnf0 g is a nite linear combination of elements of A. A proof of this statement is available in Appendix C.1. We nally emphasize that the solution to the OMP recursions may not be unique. Indeed, in situations where the \atom selection" problem in Line 6 admits several solutions, there may exist several output sets, S and fc g , `=1 verifying OMP recursions. Hereafter, given an observation vector y, we will call any set S which can be generated by OMP with y as input, as a \reachable support". 12 Notions of recovery. The recovery results stated in the next sections of the paper will involve the following notions of success: \exact k-step recovery of ? ? S " and \exact S -delayed recovery of S ". We devote the remainder of this section to rigorously de ning these two notions. We say that OMP achieves exact recovery of coecients fc g  R `=1 ? ? k k ? and atom parameters S , f g if fc g and S can be unambiguously ` `=1 `=1 identi ed from any reachable outputs of OMP (S and fbc g ) run with y = `=1 c a( ) as input. We note that a simple necessary and sucient condition `=1 ` k ? for exact recovery of fc g and S reads `=1 S  S; (3.8) for each reachable support S . This can be seen from the following arguments. If there is a reachable support such that S + S , then exact recovery is obvi- ? ? ously not attained since there exists some  2 S that is not identi ed in S . Conversely, if S  S holds, one must have bc = c if  2 S ` ` ` 8` 2 J1; kK; (3.9) bc = 0 otherwise; because the atoms fa() :  2 Sg selected by OMP are always linearly indepen- ? k ? ? k dent and y 2 span(fa( )g ). Therefore, S = f g can be unambiguously ` `=1 ` `=1 identi ed from the non-zero elements of fbc g . `=1 In the literature related to the conditions of success of OMP, a distinction is ? ? b b usually made between the cases \S = S " and \S  S ": the former is referred ? k to as \k-step recovery" because it implies that OMP identi es S and fc g `=1 in exactly k steps; the latter is known as \delayed recovery" because OMP may require (if the inclusion is strict) to carry out more than k iterations to identify ? k S and fc g . `=1 In this paper we will focus on conditions ensuring the correct identi cation of a given support S of cardinality k for any choice of the non-zero weighting k ? coecients fc g . The notions of \exact k-step recovery of S " and \exact `=1 S -delayed recovery of S " announced at the beginning of this section then read ? ? as follows. We say that OMP achieves \exact k-step recovery of S " if S = S for any choice of fc g  R and any reachable output S . This implies that `=1 there is only one reachable output. Moreover, given some set S  , we say that OMP achieves \exact S -delayed recovery of S " if S  S  S (3.10) for any choice of fc g  R and any reachable output S . \S -delayed recov- `=1 ery" can be regarded as a re ned version of \delayed recovery" where the set of parameters that OMP may select is guaranteed to belong to some set S  S . Uninterestingly this is always the case with S = , so what will be important in our results is to establish conditions such that we can identify a nite set S , 13 ? ? determined by the only speci cation of S , such that S -delayed recovery of S holds. We note that S -delayed recovery implies that OMP identi es S in at most card(S ) iterations. Finally, we emphasize that \exact S -delayed recovery ? ? of S " is equivalent to \exact k-step recovery of S ". We will sometimes use the former in the formulation of our results to have more compact statements. We will also always implicitly assume that OMP achieves k-step recovery of ? ? S when S = ; since this implies that y = 0 and OMP returns the empty support S = ; at iteration 0 in this case. 3.2. Exact recovery of a given support: sucient conditions In this section, we highlight some instrumental properties of the dictionary A and support S which allow OMP to achieve exact card(S )-step recovery of each S  S (see Theorem 2). These conditions are the basis of our results on \CMF dictionaries" stated in Section 3.4. We rst notice that, in the context of continuous dictionaries, the k-step analysis of Theorem 1 still applies: condition (2:2 ERC) along with the linear independence of the atoms fa( )g are still necessary and sucient for exact `=1 ? 6 recovery of a support S . However, the standard formulation max G g < 1; (3.11) ? 1 2nS equivalent to (2:2 ERC) in the discrete setting, does no longer hold in the case of continuous dictionaries as the supremum sup G g (3.12) 2nS is always at least 1. In order to circumvent this problem, we identify below two simpler condi- tions, respectively on the dictionary A (via its induced kernel ) and the support ? ? k S , which imply that the atoms fa( )g are linearly independent and that ` `=1 (2:2 ERC) is veri ed, see Theorem 2 below. The following de nition includes assumptions on the kernel ensuring that the dictionary atoms are normalized, and that the atom function  7! a() is injective and continuous. De nition 1 (Admissible kernel). A kernel  is said to be admissible if: i) it veri es (3.4) and (3.6). 0 0 ii) 0   ;  < 1 for any  6=  . By extension, a dictionary A = fa() :  2 g is said to be admissible if its induced kernel is admissible. We note in particular that (2:2 ERC) ensures that the \atom selection" step in Line 6 of Algorithm 1 is well-de ned since the maximizers are ensured to belong to the nite set S . 7 1 ? Indeed, rst notice that G g = 1 for all  2 S . One then obtains that the supremum is at least 1 by continuity of  7! G g . 14 ? De nition 2 (Admissible support with respect to kernel ). A support S = ? k f g is admissible with respect to a kernel  if the following holds for any ` `=1 non-empty subset T  J1; kK and any positive coecients fc g  R such ` `2T that c < 1: `2T i) The set of global maximizers of :  ! R ; (3.13) 7! c (;  ) `2T is a subset of f g . ` `2T ? ? ii) If ` 2 J1; kKn T satis es () (;  )  0 for all  2 f g , then 0 ` 2T ` ` 8  2  ; () (;  )  0: (3.14) By extension, the support S is said to be admissible with respect to dictionary A = fa() :  2 g if S is admissible with respect to the kernel induced by A. With these de nitions, our rst recovery result reads: Theorem 2. Assume A is admissible and S is admissible with respect to A. Then, OMP achieves exact card(S )-step recovery of each S  S . A proof of this result is available in Appendix A. Theorem 2 provides some sucient conditions for exact card(S )-step recovery of any S  S via the de nitions of \admissible dictionary" and \admissible support". In particular, the conditions of Theorem 2 imply that (2:2 ERC) is satis ed for any S  S . As we will see in Section 3.4, the admissibility of A and S may be much easier to prove in some cases than verifying directly that (2:2 ERC) holds. As the admissibility conditions stated in De nitions 1 and 2 may appear somewhat technical, we discuss hereafter the di erent items appearing in these de nitions in order to shed some light on the scope of Theorem 2. In De nition 1, (3.4) ensures that the kernel  induced by A is continu- ous and that the dictionary A only contains unit-norm atoms. The continuity assumption is crucial in the derivation of our result since it induces a spe- ci c structure on the dictionary. The unit-norm hypothesis is only secondary but allows to avoid some unnecessary technicalities in the proofs. Hypothesis (3.6) ensures the well-posedness of the \atom selection" step in Line 6 of Al- gorithm 1, see Lemma 1. Finally, 0  (;  ) implies that the inner product between two atoms of A is always nonnegative, whereas (;  ) < 1 guarantees 0 0 that a() 6= a( ) for  6=  , i.e., that  7! a() is an injective function (remem- ber that we assume (; ) = 1 for all  2 ; the fact that atoms are distinct is thus a direct consequence of the Cauchy-Schwarz inequality). The atoms 0 0 a() 6= a( ) for  6=  being normalized, distinct, and positively correlated, they are also linearly independent. 15 As for De nition 2, item i) ensures that a correct atom selection always occurs when the residual r is a positive combination of the atoms of the support and the kernel  is admissible. Indeed, if r = c a( ) with c ; : : : ; c > 0 ` 1 k `=1 ` and the kernel is admissible then from De nition 1 jha();rij = c (;  ): (3.15) `=1 In such a case, item i) of De nition 2 then implies arg max jha();rij  S : (3.16) Item ii) of De nition 2 does not have such a simple interpretation but a careful inspection of our proof in Appendix A shows that this condition is instrumental for deriving the result stated in Theorem 2. Altogether, given some admissible dictionary A, Theorem 2 allows us to establish recovery results valid without sign constraints by only proving the two assumptions gathered in De nition 2, which somehow correspond to establishing the result for the easier case of pos- itive combinations of atoms. 3.3. CMF dictionaries In the next section, we will particularize Theorem 2 to a family of dictionaries whose kernel is de ned via a completely monotone function (CMF). In this section, we provide a precise de nition of this family of dictionaries and some of their properties that will be used throughout the paper. We rst recall the de nition of a CMF: De nition 3 (CMF [68, Def. 7.1]). A function ' : R 7! R is completely monotone on [0; +1[ if it is in nitely di erentiable on ]0; +1[, right continuous at 0, and if its derivatives obey n (n) (1) ' (x)  0 8x; n 2 R  N: (3.17) As described in the following example, many well-known functions are CMFs: Example 2. The following functions are completely monotone [6]: • the function x 7! e for  > 0 which gives birth to the Laplace kernel, • the function x 7! for  > 0, 1+x • ratios of modi ed Bessel functions of the rst kind, • a subset of the con uent hypergeometric functions (Kummer's function), In particular, we note that if item i) of De nition 2 is true, then its conclusion still holds without the hypothesis \ c < 1". `2T 16 • a subset of the Gauss hypergeometric functions. By de nition, CMFs are non-negative, non-increasing and convex functions. Moreover, they admit an integral formulation in terms of Laplace transform of a Borel measure: Lemma 2 (Bernstein-Widder theorem, [68, Th. 7.11]). A function ' is com- pletely monotone on [0; +1[ if and only if there exists a non-negative nite measure  on Borel sets of [0; +1[ such that ux '(x) = e d(u); (3.18) ux where the integral converges for all x  0 since  is nite and e  1. We note for example that the Laplace kernel (see Example 2) is a CMF with representation measure equal to  =  with  > 0. In the sequel we will consider the following class of kernels and dictionaries whose de nitions rely on the concept of CMF: De nition 4 (CMF kernel and dictionary). The class of CMF kernels in di- D D mension D  1, denoted K (D), consists of all kernels  : R  R ! R CMF such that 0 0 0 D (;  ) = ' k  k 8;  2 R (3.19) where ' is a CMF verifying '(0) = 1, lim '(x) = 0 and 0 < p  1. x!+1 By extension, we say that A is a CMF dictionary in dimension D  1 if its induced kernel belongs to K (D): CMF We note that the constraint '(0) = 1 in the previous de nition ensures that the \unit-norm" hypothesis (3.4a) is satis ed. We also mention that the con- straint lim '(x) = 0 is necessary so that CMF kernels satisfy the vanishing x!+1 property (3.6). At this point, a legitimate question is whether kernel (3.19) can be induced by some dictionary A? The answer is positive and is a corollary of the following lemma: Lemma 3. Let ' : R ! R be a CMF such that '(0) = 1, lim '(x) = 0 + x!1 and 0 < p  1. Then, any function of the form : R ! R ! 7! '(k!k ) (3.20) is positive de nite. A proof of this result is provided in Appendix B.1. We refer the reader to [68, Def. 6.1] for a precise de nition of positive (semi-) de nite functions and [68, Th. 6.2] for a review of some of their basic properties. In particular, the positive de nite nature of '(k  k ) used in conjunction with standard results in the theory of \reproducing kernel Hilbert spaces" (see e.g., [69, Th. 3.11]) implies the following corollary: 17 Corollary 1 (Existence of CMF dictionaries). For any  2 K (D), there CMF exists some Hilbert space H and some (continuous) function a : R ! H such that (3.3) holds. Moreover, any nite collection of distinct elements from A = fa() :  2 R g is linearly independent. We see from the last part of the corollary that CMF dictionaries are necessar- ily de ned in in nite-dimensional Hilbert spaces H. If not, any collection of dim(H) + 1 elements of A would be linearly dependent which is in contradiction with Corollary 1. The next example exhibits a family of atoms in H = L (R) which is a CMF dictionary in R. Example 3. Let  = R and consider the dictionary A de ned by a : R ! L (R) p (3.21) (t) 7! f (t) = 2 e 1 (t) ftg for some  > 0, where 1 is the \indicator" function which is equal to 1 if ftg t   and 0 otherwise. Straightforward calculations both show that ka()k = 1 for any  and the inner product in H = L (R) between two atoms writes 0 j j a  ;a  = e . The latter function corresponds to the so-called \Laplace kernel". This kernel is an element of K (1) according to Example 2. CMF We conclude this section by introducing a particular CMF kernel which will be used in the statement of some of our results in Section 3.4: De nition 5 (Generalized Laplace kernel and dictionary). The class of Gener- alized Laplace kernels in dimension D, denoted K (D), consists of all kernels Lap D D : R  R ! R such that 0 p 0 k k 0 D (;  ) = e 8;  2 R (3.22) where  > 0 and 0 < p  1. By extension, a Generalized Laplace dictionary in dimension D  1 is a collec- tion of atoms A = a() :  2 R whose induced kernel belongs to K (D). Lap One immediately sees that K (D)  K (D) since the function t 7! e Lap CMF de ned on R is a CMF (see Example 2). 3.4. Recovery conditions in CMF dictionaries In this section, we provide recovery results for OMP in CMF dictionaries. The proofs of our results are based on the sucient conditions presented in Theorem 2 and are reported in Appendix B. A rst surprising result holds when  = R: Theorem 3. Assume A is a CMF dictionary in dimension 1. Then, OMP ? ? achieves exact card(S )-step recovery of each nite support S  R. 18 In essence, Theorem 3 identi es a class of dictionaries for which exact k-step recovery is possible for any support S of any nite size k. We note that the notions of exact recovery of a support S de ned in Section 3.1 do not involve any sign constraint on the coecients fc g used to generate the observa- `=1 tion vector y. As a comparison, the results ensuring the success of continuous BP/BLasso with no sign constraints on fc g require some \minimum sep- `=1 ? k aration condition" between parameters f g to hold (see (2.6) and related ` `=1 discussion). Conversely, the recovery results for BLasso obtained in [30] without separation condition require weighting coecients fc g to be positive. The `=1 novelty of Theorem 3 is thus a separation-free recovery result for any signed nite linear combination of atoms. The strength of the result obtained in The- orem 3 comes however at a price: it applies to a speci c family of dictionaries, namely CMF dictionaries. In particular, as mentioned in Section 3.3, the space H in which CMF dictionaries live is necessarily in nite-dimensional, and the corresponding kernels exhibit a discontinuity in all their partial derivatives at =  2 . Another price to pay is that the recovery guarantees are for OMP, an algorithm explicitly involving the search for the global maximum of an optimization problem, cf Line 6 of Algorithm 1. In higher dimension D > 1, the \universal" exact recovery result stated in Theorem 3 no longer holds, as shown in the next example. More precisely, if D  3, we emphasize that there always exists a con guration of parameters ? k k f g such that OMP fails at the rst iteration for some fc g  R : ` `=1 `=1 ? ? k D Example 4. Let D  3 and 3  k  D. Consider S , f g  R and ` `=1 > 0 such that ? ? p p 0 k  k = 2 8` 6= ` ` ` p ? p p k 0 k =  8`: ` p D D p Let a : R 7! H de ne a CMF dictionary in R with kernel  = ' kk . We next show that, if  is suciently small, there always exists a linear combi- ? k ? nation of fa( )g such that OMP selects a parameter not in S at the rst ` `=1 iteration. Let us consider y = c a( ) and assume that all coecients c are ` ` `=1 ` equal. We then have ha(0 );yi k'( ) = : (3.23) ? p ha( );yi 1 + (k 1)'(2 ) Then,  = 0 will be preferred to all \ground-truth" parameters  at the rst iteration of OMP as soon as the quantity in (3.23) is larger than 1, or, equiva- lently, p p (k 1)'(2 ) k'( ) + 1 < 0: (3.24) Let us show that (3.24) holds whenever  is \suciently small". For simplic- ity, consider rst the case where '(t) = e with  > 0. Condition (3.24) writes (k 1)x kx + 1 < 0 (3.25) 19 p with x = '( ) = e . As k  3, the left-hand side of (3.25) is a second or- der polynomial with two distinct roots, namely (k 1) and 1. Therefore, OMP p 1 prefers 0 as soon as (k 1) < x < 1 or, equivalently,  <  log(k 1). The latter condition implies a necessary separation condition such that OMP does not fail at the rst iteration. We note that it is possible to draw simi- lar conclusions whenever ' is a CMF function right-di erentiable at zero. The proof of this result requires extra work that is detailed in Appendix D. Although a \universal" k-step recovery result such as Theorem 3 no longer holds in CMF dictionaries when D > 1, it is nevertheless possible to show that some form of exact recovery of a support S is possible under an additional condition on the kernel induced by the CMF dictionary (see Theorem 4). This additional condition is referred to as \axis admissibility" hereafter and is encap- sulated in De nition 7 below. Before moving on to this de nition, it is necessary to introduce the notions of \Cartesian grid" and \set augmenter operator": De nition 6 (Cartesian grid). A nite set G  R is a Cartesian grid in dimension D  1 if there exists D one-dimensional nite sets fS g such d=1 that G = S ; (3.26) d=1 where denotes the Cartesian product. We moreover de ne the following \set augmenter" operator that, given a nite set S  R , returns the smallest Cartesian grid containing S : n o Grid(S ) , [d] :  2 S : (3.27) d=1 It is quite straightforward to see that S  Grid(S ) for any nite set S  R and that the operator Grid is idempotent. We illustrate the de nition of Grid(S ) in Fig. 1 in dimension D = 2 for S = f ;  ;  g. 1 2 3 We are now ready to introduce the notion of \axis admissibility": De nition 7 (Axis admissibility with respect to a kernel). A Cartesian grid D card(G) G = S = f g is said to be axis admissible with respect to a kernel d ` d=1 `=1 card(G) if and only if 8d 2 J1; DK, 8 2 R with [d] = 0 and 8fc g  R such `=1 that the function card(G) f (t) = c ( + te ;  ) (3.28) d ` d ` `=1 is not identically zero, we have ; =6 arg max f (t)  S : (3.29) d d t2R 20 θ θ θ 4 5 1 θ θ θ 2 6 7 θ θ θ 8 3 9 Figure 1: Illustration in dimension D = 2 with k = 3 of the de nition of the set augmenter Grid de ned in (3.27). The blue points, denoted  for ` 2 f1; 2; 3g, form the support S. The red points, denoted  , ` 2 J4; 9K represent the elements of Grid(S)nS . By extension, a Cartesian grid G is said to be axis admissible with respect to a dictionary A if it is axis admissible with respect to the kernel induced by A. The notion of axis admissibility will be central in our next result to ensure the ? ? k exact recovery of some supportS = f g in a CMF dictionary. In particular, ` `=1 ? ? we will see that axis admissibility of Grid(S ) ensures exact Grid(S )-delayed ? ? ? recovery of each S  S . Moreover, exact S -delayed recovery of each S  S is achievable by combining axis admissibility of Grid(S ) with the following restricted version of the ERC: max G g < 1 (3.30-R-ERC) ? 1 2 Grid(S )nS where 0 ? ? 0 G[`; ` ] , ha( );a( )i 8`; ` 2 J1; kK ` ` : (3.31) g [`] , ha();a( )i 8` 2 J1; kK Formally, our next result writes as follows: Theorem 4. Let A be a CMF dictionary in R with induced kernel  and let ? ? k S = f g . ` `=1 ? ? • If Grid(S ) is axis admissible with respect to , then OMP achieves Grid(S )- delayed recovery of each S  S . If (3.30-R-ERC) moreover holds, OMP ? ? achieves S -delayed recovery of each S  S . We remind the reader that G is invertible as the Gram matrix of a set of linearly inde- pendent atoms (see Corollary 1). 21 • Conversely, if (3.30-R-ERC) does not hold, there exists not all-zero coef- k ? cients fc g such that OMP with y = c a( ) as input selects ` ` `=1 ` `=1 some  2= S at the rst iteration. A rst outcome of Theorem 4 is a (pessimistic) upper bound on the number ? ? of iterations needed to identify S when A is a CMF dictionary and Grid(S ) is axis admissible with respect to the kernel induced by A. In particular, the rst ? D part of the theorem states that OMP needs no more than card(Grid(S ))  k iterations to succeed. As shown in the second part of the theorem, this (rather pessimistic) upper bound on the number of iterations can be decreased to k if an additional restricted ERC (3.30-R-ERC) is veri ed. Interestingly, whereas the parameter space  is a continuum, (3.30-R-ERC) only depends on a nite subset of the elements of  (namely Grid(S )) and its numerical evaluation is therefore possible. We will see in Theorem 5 below, that this restricted ERC allows us to derive a separability condition for exact k-step recovery in Generalized Laplace dictionaries. Besides, we note that additional strategies could be investigated to improve the upper bound, exploiting, e.g., coecients decay [50]. In our next result, we show that the property of \axis admissibility" can be (at least) satis ed for some CMF dictionaries. In particular, the next lemma emphasizes that any Cartesian grid is axis admissible for Generalized Laplace dictionaries (see De nition 5): Lemma 4. Let A be a Generalized Laplace dictionary in R . Then all Carte- sian grids G are admissible with respect to A. A proof of this result is given in Appendix B.4. Combining this lemma with Theorem 4 immediately leads to the following corollary: Corollary 2. Let A be a Generalized Laplace dictionary in R . Then OMP ? ? D achieves exact Grid(S )-delayed recovery of each nite support S  R . Interestingly, although Example 4 showed that exact card(S )-step recovery does not hold for arbitrary S in CMF dictionaries, Corollary 2 emphasizes that exact Grid(S )-delayed recovery is achievable by OMP in Generalized Laplace ? ? dictionaries for any S and any k = card(S ) 2 N . Following our remark below Theorem 4, OMP is thus ensured to identify any support of size k in at most k iterations in this type of dictionaries. Similar to Theorem 3, no separation assumptions nor sign constraints are needed here to ensure our recovery result, although it applies to a very speci c family of dictionaries. We will see in Theorem 5 below that adding some separation condition on the elements of S enables to verify (3.30-R-ERC) and therefore leads to an exact-recovery result in at most k steps. Before moving on to the statement of this result, let us mention that, al- though Lemma 4 shows that any Cartesian grid is axis admissible with respect to Generalized Laplace dictionaries, such a result does in general not hold for CMF dictionaries without extra assumptions on the grid. Nevertheless, our em- pirical evidence suggests that the admissible grid assumption is only an artifact of our proof technique. We conjecture that Theorem 4 remains valid even when 22 the Cartesian grid G is not axis admissible. To support our conjecture, we show in Appendix E that the second part of Theorem 4 still holds for any CMF ? D ? ? dictionary and for any S  R with card(S ) = 2, even though Grid(S ) is generally not axis admissible. The proof of this kind of result in the general case is still under investigation. In the last result of this section, we particularize (3.30-R-ERC) to derive ? ? k a separation condition on the elements of S = f g that ensures exact ` `=1 card(S )-step recovery of each S  S in Generalized Laplace dictionaries. We rst note that, following standard results of the literature (see e.g., [45]), (3.30-R-ERC) can be relaxed to a mutual coherence condition: < (3.32) 2k 1 where , max jha();a( )ij: (3.33) 0 ? ; 2Grid(S ) 6= Our separation result is then a simple consequence of this mutual coherence condition: Theorem 5. Let A be a Generalized Laplace dictionary in R with parameters ? ? k > 0 and 0 < p  1. Consider S = f g and let ` `=1 ? ? 0 ? ? , min minfj [d]  [d]j : `; ` 2 J1; kK s.t.  [d] 6=  [d]g : (3.34) 0 0 ` ` ` ` d2J1;DK If log(2k 1) (3.35) then, OMP achieves exact card(S )-steps recovery of each S  S . ? 0 p Proof. By de nition of  and of Grid(S ), we have k  k   for all p 0 0 ? ;  2 Grid(S ). Hence, using the de nition of the mutual coherence in (3.33) 0 p 0 ? we have  = exp(k k ) for some ;  2 S so   exp( ) and (3.35) implies that  < (2k 1) holds. Theorem 5 states that, with Generalized Laplace dictionaries, OMP recovers any linear combination of k suciently separated atoms in k steps. Although condition (3.35) is expressed in terms of minimal distance between parameters, it can be seen as a condition on the mutual coherence between atoms. However, in contrast to the discrete case, this mutual coherence guarantee is only related to a particular nite subset of the (continuous) Generalized Laplace dictionary, namely the atoms with parameters in Grid(S ). Furthermore, condition (3.35) is reminiscent of the separation condition for o -the-grid super-resolution proposed in [3], see (2.6). The so-called separation condition discussed in (2.6) is expressed on a D-dimensional torus preventing also high values of k. For example, in a unit-length 1-dimensional torus and 23 with the notations of (2.6), the minimum separation condition for BP requires k  1. Note however that these results involve di erent dictionaries and settings making relevant comparison tedious. 4. Conclusion - discussion In this work, we have shown that the study of the recovery properties of greedy procedures such Orthogonal Matching Pursuit (OMP) can be extended to the setting of continuous dictionaries where the atoms continuously depend on some parameters. Capitalizing on the formulation of OMP in terms of in- ner products between atoms, our results rely on the properties of the kernel implicitly de ned by the inner product between atoms. More particularly, we have identi ed two key notions which we have called admissible kernel and ad- missible support, that are sucient to ensure exact recovery irrespective of the value of the coecients involved in the representation. For the class of CMF dictionaries, we have shown that when the dimension of the parameter space is 1, all implicitly de ned kernels as well as all supports are admissible. Up to our knowledge, this is the rst class of kernels for which no separation is needed to achieve exact recovery, even for signed combinations of atoms. However, such a \universal" recovery result comes at a price since CMF dictionaries can only live in in nite-dimensional observation spaces H and the corresponding kernels exhibit some discontinuities in their derivatives. Although exact recovery can also be ensured for CMF dictionaries with a pa- rameter space of dimension greater then 1, extra conditions have to be imposed on the support to be recovered, as some supports may not be admissible any- more. The cornerstone of our analysis in the multi-dimensional case is the notion of axis admissible Cartesian grid. Indeed, axis admissibility is sucient to allow OMP to identify supports, leading to a form of \delayed recovery" for all supports of size k embedded in some admissible Cartesian grid. For such supports, exact k-step recovery can also be achieved whenever a condition on a nite number of (known) atoms is ful lled. In the special case of Generalized Laplace dictionaries, any Cartesian grid turns out to be axis admissible, and a simpli ed coherence-based analysis can be revisited, leading to exact k-step recovery under a minimal separation condition. We now review some prospects of this work: Beyond axis-admissible grids for CMF kernels. Our analysis for multi-dimensional parameter sets relies on the notion of axis-admissible grids. While axis admis- sibility holds for any grid with respect to Generalized Laplace dictionaries, this is apparently no longer the case with respect to more general CMF dictionaries. Even for grids which seem to violate the axis-admissibility condition with re- spect to a CMF dictionary, empirical evidence suggests that Theorem 4 remains valid. As a rst step towards a better understanding of this phenomenon, we showed in Appendix E that, for supports of size 2, axis-admissibility is not necessary for the conclusion of Theorem 4 to hold. 24 Connection with TV-minimization. In light of the existing links between Tropp's ERC [45] and recovery guarantees for ` minimization [49], an interesting ques- tion is whether the guarantees developed in this paper can be extended to sparse spike recovery with total variation norm minimization (see Section 2). More particularly, one could bene t from the null-space properties for measures [30] which characterize the solution of the continuous version of Basis Pursuit. Such a connection may yield support recovery results for signed combinations of atoms with TV-norm minimization without separation conditions. Robustness to estimation error. In the discrete setting, one advantage of greedy procedures over convex relaxations is that the associated recovery guarantees involve solutions provided by actual algorithms rather than merely expressed as the minimizer of some optimization problem. In the continuous setting, this has to be tempered with the fact that implementing OMP requires a (possibly intractable) global maximization procedure at each iteration. Our current anal- ysis does not take into account the resulting numerical estimation error or the fact that there may be spurious local maxima. One could envision overcoming some of these limitations by analyzing the behavior of OMP when a small error is systematically done when maximizing the inner product in Line 6 of Algo- rithm 1. Note that such an approximation error may also be useful to account for discretized implementations of the latter step of OMP using a ne grid over the parameter set . A. Proof of Theorem 2 ? ? k Let S  S = f g . Without loss of generality, we assume that S 6= ; ` `=1 card(S) ? ? corresponds to the rst card(S ) elements of S , that is S = f g . `=1 We rst notice that, as a direct consequence of De nition 2, if S is admis- sible with respect to  then any S  S is also admissible with respect to . The result stated in Theorem 2 is then a direct consequence of Theorem 1 and the following proposition: card(S) Proposition 1. Assume kernel  is admissible and S = f g is admis- ` `=1 sible with respect to . Then we have that ? k i) the atoms fa( )g are linearly independent, ` `=1 ii) 8  2 nS ; G g < 1, where 0 ? ? 0 G[`; ` ] , ( ;  0 ) 8`; ` 2 J1; card(S )K (A.1) ` ` g [`] , (;  ) 8` 2 J1; card(S )K: (A.2) We thus spend the rest of this section in proving Proposition 1. 25 card(S) Proof of item i) of Proposition 1. Let fc g  R be such that y , `=1 card(S) c a( ) = 0 , and let T be the set of indices such that c 6= 0. ` H ` `=1 card(S) Without loss of generality, we can assume that jc j < 1. We will prove `=1 by contradiction that T is empty. Assuming that T is not empty, we rst prove that the sign of the coecients fc g cannot be all equal. To this end, let us assume (without loss of gener- ` `2T ality) that c > 0 for all ` 2 T and show that a contradiction occurs with the hypothesis of admissibility of S . Since y = 0 , the function :  7! ha();yi is identically equal to zero. Hence, on the one hand, any point of  is a maximizer. On the other hand, since all the elements of fc g are positive and S is (by ` `2T hypothesis) admissible with respect to , we have from item i) of De nition 2 that the maximizers of must belong to S . This implies that   S which contradicts the de nition of S and . Therefore, if T is not empty, not all the elements of fc g have the same sign. ` `2T We can thus partition T into two non-empty disjoint subsets: T = f` 2 T : c > 0g; + ` T = f` 2 T : c < 0g: Similarly, we let S = f 2 S : ` 2 T g; + + S = f 2 S : ` 2 T g: Since the elements of S  S [S are pairwise distinct, we have S \S = ;. + + P P ? ? De ning y = c a( ) and y = c a( ), we note that y = ` ` + `2T ` `2T ` y + y . Using the fact that y = 0 , one deduces that y = y . Moreover, + H + y (resp. y ) is a positive linear combination of atoms with parameters in S  S (resp. S  S ). Therefore, since S is admissible with respect to , item i) of De nition 2 applies and we have that any maximizer of :  7! a();y = a();y must belong to S \ S . Now, on the one hand, by Lemma 1, the set of maximizers of cannot be empty. On the other hand S \S = ;. This leads to a contradiction. Therefore we must have T = ;. In card(S) other words, y = 0 implies c = 0 8` 2 S , so that the atoms fa( )g H ` ` `=1 are linearly independent. As a consequence of this rst part of the proposition, the Gram matrix of card(S) any subset of fa( )g is a positive de nite matrix, and therefore invert- ` `=1 ible. In particular, the inverse of matrices G and G appearing in the second part of the proof is always well-de ned. Proof of item ii) of Proposition 1. Recall that, as a consequence of De nition 2, if S is admissible with respect to , then any support S  S is also admissible. We thus show our result by induction on the cardinality of S . For notational 0 0 0 convenience, we let hereafter k , card(S ). We prove by induction on k that: a) G 1 0 has nonnegative entries, 26 1 b) 8  2  ; G g has nonnegative entries, 0 1 c) 8 2 nS , kG g k < 1. The quantities G and g appearing above are de ned in (A.1)-(A.2) with the substitution S $ S . Item c) corresponds to result ii) of Proposition 1. Items a) and b) are intermediate results that allow a subdivision of the proof into steps. Initialization: k = 1. In this case, both G and g are scalars. Since  is admissible, we have G = 1 and g  0 (cf De nition 1). Therefore, items a) 1 ? and b) are ful lled and kG g k = g = (;  ) and, using De nition 1-ii), we have (;  ) < 1. Hence, item c) is also true. 0 0 Induction: 1 < k  k. We assume items a)-b)-c) hold for any S  S of 0 0 0 cardinality k 1  1. Considering S  S an arbitrary support of size k , we show that items a)-b)-c) also hold for S . Without loss of generality, we 0 0 will assume that S corresponds to the rst card(S ) elements of S , that is 0 ? k S = f g . ` `=1 ? k 1 0 We consider S = f g  S and use over-lined notations for quantities ` `=1 0 0 0 (k 1)(k 1) k 1 related to S : we denote by G 2 R , g 2 R the quantities 0 0 0 k k k de ned in (A.1)-(A.2) for S and by G 2 R ; g 2 R the same quantities 0 0 0 k 1 k 0 for S . Likewise, the notations g 2 R ; g 0 2 R for ` = 1 : : : k 1, 0 0 ` = 1 : : : k will refer to the columns of G and G, respectively. With these notations we have: g = 2 R 8  2  (A.3) (;  ) 0 0 G g 0 k k k G = 2 R (A.4) g 0 1 where we denote g , g for notational convenience. We note that, as men- 0 ? tioned above, item i) of Proposition 1 ensures that both G and G are invertible. Item a). We show that the last entry of u , G 1 0 is positive. Since the reasoning holds for any ordering of the  's, we then deduce that all the entries of u are positive. Block inversion results [70, Cor. 2.8.9] give 1 1 1 1 G + sG g 0 g 0 G sG g 0 1 k k k G = ; (A.5) sg G s T 1 where s , (1 g G g ) . Notice that 0 0 k k 1 1 g 0 G g 0  kg 0k G g 0 k k k k G g < 1: (A.6) 27 The rst inequality is a consequence of H older's inequality, the second of De - nition 1 and the third follows from induction hypothesis c). Hence s > 0. 1 0 T 0 0 The last entry of u = G 1 now writes u[k ] = s(1 g 0 G 1 ). By k k 1 1 1 induction hypothesis b), we have kG g 0k = g 0 G 1 . Using (A.6) and 1 k 1 k k the fact that s > 0, we thus have u[k ] > 0. Item b). We rst show that the last entry of v , G g is non-negative. Given the decomposition of G in (A.5), the last entry of G g writes 1 1 0 0 T ? T v[k ] = s g [k ] g 0 G g = s (;  0 ) g 0 G g : (A.7) k  k k ? T Since s > 0 (see (A.6)) it is then sucient to show that (;  ) v g  0, where v , G g , in order to show that v[k ]  0. This will be achieved by studying this quantity seen as a function of . Consider T  J1; k 1K the (possibly empty) set de ned by T , f` : v[`] 6= 0g and de ne :  ! R 1 + P 0 P k 1 T ? ? 7! v g = v[`](;  ) = v[`](;  ): (A.8) ` ` `=1 `2T Notice that: • () = g G g , • the entries of v are nonnegative by the induction hypothesis b). Moreover, P 0 k 1 from induction hypothesis c), we have v[`] = kG g 0k < 1, `=1 0 ? • for j 2 J1; k 1K and  =  we have g = g = Ge , where e is the j j j  j 0 1 T T k 1 ? j-th canonical vector of R . Hence, ( ) = g G g = g e = 0 0 1 j j k j k ? ? ? ? ? 0 ;  and ( )   ;  = 0 8j 6= k . 0 1 0 j j j k k Since S is admissible with respect to , we can apply item ii) of De nition 2 0 0 with ` = k to any ; =6 T  J1; k 1K. This leads to: ? T ? (;  0 ) v g = (;  0 ) ()  0 (A.9) k  k for all  2 . The same obviously holds if T is empty as () is identically zero and the admissibility of  implies that it is nonnegative (see De nition 1). Since this result does not depend on the ordering of the  's, we can nally conclude that all the elements of G g are nonnegative. Item c). Let :  ! R : (A.10) 7! G g We need to prove that () < 1 for all  2= S . 28 1 1 From item b), we know that G g has nonnegative entries, so thatkG g k = T 1 1 k 1 G g . Letting u , G 1 0 2 R , the function () can then be written k 2 as () , u[`](;  ): (A.11) `=1 Moreover we have u[`]  0 8` since we showed in item a) that G 1 0 = u has nonnegative entries. We also note that u 6= 0 0 since Gu = 1 0 . k k Applying item i) of De nition 2 together with the comment in Footnote 8, ? 0 we have that the maximizer of () must belong to f : u[`] 6= 0g  S . Now, 0 ? T 1 T 8 j 2 J1; k K ; ( ) = 1 G g = 1 e = 1: (A.12) 0 0 2  j j k k Therefore, () < 1 for all  2= S . B. Proofs related to CMF dictionaries This appendix contains the proofs of the results related to CMF dictionaries presented in Sections 3.3 and 3.4. We rst state and prove a technical lemma which will used in the proofs of Lemma 3 and Theorem 3: Lemma 5. Let ' be a CMF such that '(0) = 1 and lim '(t) = 0. Then t!1 the Borel measure  appearing in the integral representation of the CMF in Lemma 2 is nonzero and satis es (f0g) = 0 and (R ) = 1. Moreover, ' is (1) strictly positive and strictly decreasing on R , with ' (t) < 0 on R . Proof. By the integral representation of Lemma 2 we have (R ) = '(0) = 1 ux hence  is nonzero. Moreover '(x) = e d(u)  (f0g)  0 for each x  0. As lim '(x) = 0, it follows that (f0g) = 0 and therefore (R ) = x!+1 1. This proves the rst part of the statement. ux The positivy of ' follows from the fact that  is nonzero and e > 0 for all u; x  0. Hence, the integral representation (3.18) of ' yields '(x) > 0 for (1) each x > 0. Finally, we prove by contradiction that ' (t) < 0 on R . (1) (1) Assume the existence of t > 0 such that ' (t ) = 0. As ' is continuous 0 0 (1) and non-decreasing on R with ' (t)  0 for each t > 0, it follows that (1) ' (t) = 0 for each t  t , hence '(t) = '(t ) for each t  t . As we have just 0 0 0 seen, we have '(t ) > 0, hence this contradicts the assumption lim '(t) = 0 t!+1 B.1. Proof of Lemma 3 The outline of the proof is as follows. We rst show that for any 0 < p  1 D uk!k and ! 2 R , the quantity e is related to the characteristic function of uk!k some D-dimensional random vector Z . We then use this formulation of e as a characteristic function together with the Bernstein-Widder representation of CMFs (see Lemma 2) to show that the function '(k!k ) is positive de nite. 29 p uk!k Proof that e is the characteristic function of some random vector Z . In probability theory, the characteristic function of a real-valued random vector D D i! Z Z 2 R is the function ! 2 R 7! E [e ] where E denotes the expectation operator and i is the imaginary number. First we consider for any u  0 the scalar-valued function : R ! R uj!j ! 7! e and show that for u > 0 it is the characteristic function of some random variable Z which admits a density with respect to the Lebesgue measure. Our proof leverages a result due to P olya [71, Th. 1]. We reproduce this result hereafter for self-containedness of the paper: Theorem 6. Let  be a real-valued function de ned on R and such that: •  is continuous and even, •  is convex on R , • (0) = 1, • lim (!) = 0. !!+1 Then,  is the characteristic function of some random variable which admits a density with respect to the Lebesgue measure. Moreover this density is even and continuous everywhere, except possibly at zero. Observe that  is even, continuous and veri es  (0) = 1 and lim  (!) = u u !!+1 u 0 since u > 0. Moreover, for p 2 ]0; 1] and ! > 0, its second derivative on R is (2) p2 p u! (!) = up ! ((1 p) + up! )e : (2) Hence,  (!) > 0 for all ! > 0 and  is convex on R . As a consequence, u u satis es the assumptions of Theorem 6 and it is the characteristic function of some scalar random variable Z which admits a (continuous, except possibly at zero) density with respect to the Lebesgue measure, that is i!Z (!) = E e : u Z uk!k We are now ready to show that the function e is the characteristic function of some random vector Z . To this end, let us de ne the random vector 1 D Z = Z ; : : : ; Z as the concatenation of D independent copies of Z . We u u u u thus have D D h i h i Y Y T d p i! Z i![d]Z uk!k u p E e = E d e =  (![d]) = e : Z Z u d=1 d=1 30 1=p We note that for all u > 0 and ! 2 R we have  (!) =  (u !). Hence, u 1 the function  (!) can also be written as an expectation with respect to the random variable Z for all u > 0 and ! 2 R: h i 1=p iu !Z (!) = E e : (B.1) u Z Equation (B.1) obviously also holds for u = 0 since both sides of the equality are equal to 1 in that case. Using (B.1), we obtain h i 1=p T p iu ! Z uk!k E e = e 8u  0; (B.2) 1 D where Z = Z ; : : : ; Z is the concatenation of D independent copies of Z . 1 1 1 1 We will use the latter representation in the second part of the proof. Proof that '(k!k ) is a positive de nite function. We want to show that for k D k any k 2 N, any f g  R and any c 2 C nf0 g, we have ` k `=1 c Gc > 0; where () denotes the conjugate transpose operator and 0 p 0 G[`; ` ] , '(k  k ) 8`; ` 2 J1; kK: ` ` Note that in practice this will only be used for real-valued coecients, but the result is established for complex-values c to t with the standard de nition of a positive de nite function. Since ' is a CMF, Lemma 2 ensures the existence of a non-negative nite Borel measure  such that for all ! 2 R : p uk!k '(k!k ) = e d(u): Hence, k k X X H 0 p c Gc = c[`] c[` ]' k 0  k ` ` `=1 ` =1 k k X X 0 uk 0 k ` p = c[`] c[` ] e d(u) (B.3) `=1 ` =1 k k +1 h i X X 1=p T 0 iu (  ) Z 0 1 ` ` = c[`] c[` ] E e d(u); (B.4) `=1 ` =1 31 where the last equality follows (B.2). By linearity of the expectation, it follows: " # k k X X 1=p T H 0 iu (  0 ) Z ` 1 c Gc = E c[`] c[` ]e d(u) Z1 `=1 ` =1 " ! !# k k X X 1=p T 1=p T iu  Z 0 iu  Z 1 0 1 = E c[`]e c[` ]e d(u) `=1 ` =1 2 3 1=p T iu  Z 4 ` 5 = E c[`]e d(u)  0: (B.5) `=1 k p Since this holds for any c 2 C this shows that '(kk ) is positive semi-de nite. To establish that '(kk ) is a positive de nite function we now show that the equality in (B.5) can only occur if c = 0 . To this end, denote (z) , i z 2 1=p j c[`]e j and (u) , E (u Z ) for u 2 R , and assume that Z 1 + `=1 equality holds in (B.5), that is to say (u) d(u) = 0. We next show that this implies that c = 0 . First, we note that is continuous since h i 1=p (u) = E (u Z ) Z 1 k k h i X X 1=p T 0 iu ( 0 ) Z ` 1 = c[`] c[` ]E e `=1 ` =1 k k X X 0 iuk  k ` 0 ` p = c[`] c[` ] e `=1 ` =1 where the last equality follows (B.2). Second, since ' is a CMF satisfying '(0) = 1 and lim '(x) = 0, by x!1 Lemma 5 the non-negative nite Borel measure  satis es (R ) = 1. Since R = [ [n; n + 1] [ [1=(n + 1); 1=n], there must exist n  1 such that n1 n1 either  ([n; n + 1]) > 0 or  ([1=(n + 1); 1=n]) > 0. Without loss of generality consider the case  ([n; n + 1]) > 0 (the other one can be treated similarly). Since is continuous over the compact set [n; n+1], it attains its in mum over [n; n+1] at some u 2 [n; n + 1]. Now, because (u)  0 for every u  0 we have Z Z +1 n+1 0 = (u) d(u)  (u) d(u)  (u )([n; n + 1]) 0 n and we obtain (u ) = 0 since ([n; n + 1]) > 0. By construction u > 0. 0 0 Finally, we have that: • the distribution of Z has a density with respect to the Lebesgue mea- sure and its density is continuous, except possibly at points where one coordinate vanishes. 32 • is nonnegative and continuous (by construction as the squared modulus of a nite linear combination of exponentials). 1=p Hence, using the de nition of (u ) = E [ (u Z )], we deduce that there 0 Z 1 1 0 exist z 2 R (with non-vanishing coordinates) and r > 0 such that (z) = 0 8z 2 B(z ; r), where B(z ; r) is the open ball of radius r centered at z . For any 0 0 0 y 6= 0 and 0  t  r=kyk , we have z + ty 2 B(z ; r) and therefore D 2 0 0 i (z +ty) c[`]e = 0 8t 2 [0; r=kyk ]: (B.6) `=1 T T 0 it y k If y is such that  y 6=  0 y for all ` 6= ` , then the functions ft 7! e g ` ` `=1 are linearly independent on [0; r=kyk ] and (B.6) holds if and only if c = 0 . 2 k D T T It thus remains to show that there exists some y 2 R such that  y 6=  0 y ` ` for all ` 6= ` . To this end, let us consider the following nite set of vectors: 0 0 N , f  0 : `; ` 2 J1; kK; ` 6= ` g: (B.7) ` ` As the parameters  's are pairwise distinct, each n 2 N is nonzero. Denote H ` n the linear hyperplane whose normal vector is n, and consider H , [ H . n2N n Since H is the union of a nite number of D-dimensional hyperplanes, R n H D T is not empty. Consider y 2 R n H . Then, by construction, n y 6= 0 for all T T 0 n 2 N and therefore  y 6=  y for all ` 6= ` . This concludes the proof. ` ` B.2. Proof of Theorem 3 Our proof leverages Theorem 2 by showing that if A is a CMF dictionary in dimension 1 with induced kernel , then: a)  is admissible in the sense of De nition 1, ? ? k b) any nite support S = f g is admissible with respect to kernel  in ` `=1 the sense of De nition 2. The result stated in Theorem 3 is then a direct consequence of Theorem 2. We begin with a more general lemma establishing the claim a): Lemma 6. Any CMF kernel  2 K (D), D  1, is admissible. CMF Proof. First, since  2 K (D), there exists a CMF ' and scalar p 2 ]0; 1] CMF 0 0 p 0 D such that '(0) = 1 and (;  ) = '(k  k ) for all ;  2 R . Hence 0 0 (; ) = '(0) = 1 for all . Moreover, the function  7! (;  ) is continuous since both CMFs and ` -norms are continuous. Hence  satis es (3.4). The fact that  satis es the vanishing property (3.6) is a straightforward consequence of 0 0 p the fact that (;  ) = ' k  k and that lim '(t) = 0. Finally, we t!+1 prove item ii) of De nition 1. As ' satis es the assumptions of Lemma 5, it is 0 0 strictly positive and strictly decreasing. This implies (;  ) > 0 for any ;  . 0 0 p Moreover, if  6=  then t = k  k > 0 and '(t) < '(0) = 1. 33 The rest of this section is dedicated to the proof of claim b). Let us consider a non-empty subset of indices T  J1; kK with t = card(T ). Without loss of generality (up to some reordering), we assume that T = J1; tK. Letfc g  R `=1 + be such that s , c < 1 (B.8) `=1 and consider the function : R ! R (B.9) 7! c (;  ): `=1 Using the integral formulation of CMF (see Lemma 2), we have that is twice ? t di erentiable at any  2 R n f g [72, proof of Th. 12a] and its second ` `=1 derivative writes: ? p p2 (2) ? uj j () = p(1 p)c uj j e d(u) `=1 ? p 2(p1) 2 2 ? uj j + p c u j j e d(u) (B.10) We next show that items i) and ii) of De nition 2 hold. Item i) of De nition 2. First, the vanishing property (3.6) of admissible kernels ensures that admits at least one maximizer (see Lemma 1). We then show ? t that any maximizer of must necessarily belong to f g . ` `=1 ? t Since is twice continuously di erentiable on R n f g , any maximizer ` `=1 ? t (2) 2 Rnf g must verify the second-order optimality condition \ ( ) m m ` `=1 ? t 0". Now this condition can never be ful lled for  2 R n f g . Indeed, ` `=1 ? t we see that each integral term in (B.10) is positive since  2= f g and, by ` `=1 (2) Lemma 5, (R ) > 0. Since p 2 ]0; 1] and c > 08`, it follows that ( ) > 0. ` m Item ii) of De nition 2. Assume T = J1; tK 6= J1; kK and consider t 2 J1; kKn T . Without loss of generality, suppose that t = t + 1 and consider : R ! R (B.11) 7! ()  ;  : t+1 From the de nition of in (B.9),  writes ? ? () = c ' j  j '   : (B.12) ` p t+1 `=1 Assume that ( )  0 for every ` 2 J1; tK. We will then show that ()  0 for ? t every  2 Rnf g so that item ii) of De nition 2 is satis ed. ` `=1 34 Suppose there exists  2 R such that ( ) > 0 and let us show that this 0 0 leads to a contradiction. Let us rst emphasize that the existence of some  2 R such that ( ) > 0 0 0 implies that a maximizer of  exists. Indeed, we have that lim () = 0 !1 since ? ? j()j  c (;  ) +  ;  (B.13) ` t+1 `=1 and  obeys the vanishing property (3.6) by hypothesis. Hence, for any 0 < " < ( ) there exists a compact set K such that 0 " c 0 8 2 K ; ()  " < sup ( ); (B.14) 2K because  is continuous. The extreme value theorem [73, Prop. A.8] then states that (at least) one maximizer of , say  , exists and  2 K . We note that m m " by de nition ( )  ( ) > 0. We show below that we also must necessarily m 0 have ( )  0. This leads to the desired contradiction and proves the result. Let  , j  j and assume without loss of generality that ` m : (B.15) 1 t ? t ? We have that  > 0 because  2= f g since ( )  0 8` 2 J1; tK (by as- 1 m ` `=1 ` sumption) and ( )  ( ) > 0. We next show that the working assumptions m 0 also imply ( )  0 by distinguishing between three cases: Case 1:    . From (B.15), we have: t+1 1 p p u u t+1 8u  0; max e  e : (B.16) 1`t Hence t+1 ( )  c 1 e d(u) < 0: m ` `=1 | {z }| {z } <0 by hyp. >0 Case 2:  >  . We rely on the following technical lemma that exploits t+1 t the notion of \sign changes of a nite sequence". This notion is de ned as the number of times two consecutive elements of the nite sequence have opposite signs. For instance, the sequence (1; 1;1; 1) has two sign changes (respectively at the third and fourth positions). Lemma 7. Let P (u) , c e be an exponential polynomial on R with ` + `=1 0 <  < : : : <  and fc g  R . Assume that: 1 k ` `=1 • the sequence c ; : : : ; c has at most two sign changes; 1 k • P (0) < 0 and lim P (u) = 0 . u!+1 35 Then there exists u > 0 for which the following inequality holds Z Z +1 +1 f (u)P (u) d(u)  f (u ) P (u) d(u); (B.17) 0 0 for any non-decreasing function f on R and any (unsigned) nite Borel mea- sure  on R such that the integrals converge. The proof of the lemma is postponed to Appendix C.2. ? t As mentioned previously, we have on the one hand that  2= f g . On ` `=1 ? ? the other hand,  6=  because j  j =  >  > 0. Therefore, m m t+1 1 t+1 t+1 ? t+1 ? t+1 2 R nf g . Since  is twice continuously di erentiable on R n f g , ` `=1 ` `=1 must necessarily verify the following second-order optimality condition: (2) ( )  0: (B.18) (2) We next show that C( ) <  ( ) for some positive constant C > 0. Hence, m m in view of (B.18), this leads to the desired contradiction: ( ) < 0. Assume rst that  <  <  (the equality cases will be addressed later). 1 t (2) (2) (2) From (B.10) we have that  =  +  with 1 2 p p (2) p2 p2 u u ` t+1 ( ) = p(1 p) u c  e  e d(u) m ` 1 t+1 `=1 p p (2) 2(p1) 2(p1) 2 2 u u ` t+1 ( ) = p u c  e  e d(u): m ` 2 t+1 `=1 Using  >  > 0 8` 2 J1; tK, we obtain t+1 ` t 2(1p) p  p p (2) t+1 2 u u ` t+1 ( ) = u c e e d(u) m ` 2(1p) t+1 `=1 2 +1 p p 2 u u ` t+1 > u c e e d(u): (B.19) 2(1p) t+1 `=1 Note now that: • the function u 7! u is increasing, P p t u ` t+1 • P (u) , c e e is an exponential polynomial with 0 < `=1 <  <  <  and whose sequence of coecients is (c ; : : : ; c ;1) 1 t t+1 1 t and has exactly one sign change. • As max  <  by hypothesis, we have P (u) > 0 for suciently 1`t ` t+1 large u so lim P (u) = 0 . u!+1 • Since c < 1 we have P (0) < 0. `=1 36 Therefore, Lemma 7 applies and there exists u > 0 such that 2 2 p p (2) 2 2 ( ) > u P (u) d(u) = u ( ): (B.20) m m 2 0 0 2(1p) 2(1p) t+1 t+1 (2) This establishes that  ( ) > C ( ) where C > 0 is a positive constant. m 2 m 2 (2) The same rationale leads to  ( ) > C ( ) with C  0 (C = 0 for m 1 m 1 1 (2) (2) (2) (2) p = 1 since  is identically zero). Since  =  +  , one obtains that 1 1 2 (2) ( ) > (C + C )( ), which concludes the proof for  <  <  . m 1 2 m 1 t Let us now come back to the general case where      . Denote 1 t 0 t ~ ~ by  < : : : <  , with t  t, the ordered distinct values in f g , and 1 t ` `=1 0 0 0 0 let  =  . Moreover, for any ` 2 J1; t K, let c ~ be equal to the sum t +1 t+1 ` 0 0 of the coecients c such that  =  , and let c ~ = 1. We note that, ` ` ` t +1 P 0 P t t (2) by de nition, c ~ = c < 1. We can then show that  ( ) > 0 ` ` m ` =1 `=1 ~ ~ C( ) with C > 0 by applying the same reasoning as above to  ; : : : ;  0 m 1 t +1 and c ~ ; : : : ; c ~ . 1 t +1 Case 3:  <    . There exists ` 2 J1; t 1K such that: 1 t+1 t 0 <  for `  ` (B.21a) ` t+1 0 for ` > ` . (B.21b) ` t+1 0 P P P t ` t 0 " " Denote " , 1 c > 0 and let s , c + and s , c + ` 1 ` 2 ` `=1 `=1 2 `=` +1 2 such that s + s = 1. One can write 1 2 p p ` t+1 ( ) = s e e d(u) m 1 `=1 | {z } ,  ( ) 1 m p p u u ` t+1 + s e e d(u) : (B.22) `=` +1 | {z } ,  ( ) 2 m 0 c Using (B.21a) and the fact that < 1, we have that  ( ) < 0 by 1 m `=1 resorting to the same reasoning as in Case 2. Similarly, using (B.21b) and the fact that < 1, we obtain that  ( ) < 0 by the same arguments 2 m `=` +1 0 s as in Case 1. Hence, we nally have ( )  s  ( ) + s  ( ) < 0, which m 1 1 m 2 2 m leads to the desired contradiction. B.3. Proof of Theorem 4 - Recovery in dimension D ? ? k ? Let S = f g and assume Grid(S ) is axis admissible. Let us rst ` `=1 observe that, by virtue of Corollary 1, the elements of fa() :  2 Grid(S )g are linearly independent. Let R , span(fa() :  2 Grid(S )g): (B.23) 37 If r 2 Rnf0 g, we note from Lemma 1 that the function f :  7! jha();rij admits at least one maximizer since r results from a (non trivial) nite linear combination of atoms. Moreover, the maximum of f must be strictly greater ? ? than zero. If not, ha();ri = 0 8 2 Grid(S ) and one deduces that r 2 R , the orthogonal to R. Since r 2 R, this leads to r = 0 which is in contradiction with our initial assumption \r 6= 0 ". We also have that the maximizers of f must belong to Grid(S ) as shown by the following arguments. If  is a maximizer of f , then t =  [d] is a maximizer of m m f : t 7! jha( + (t  [d])e );rij: (B.24) d m m d Denoting  ,   [d]e , we have  ? e by construction and f writes 0 m m d 0 d d f (t) = c ( + t e ;  ) : (B.25) d ` 0 d `=1 Because Grid(S ) is axis admissible with respect to  that f is not identically zero (since f ( [d]) = f ( ) > 0), the maximizers of f must belong to S , d m m d d ? k f [d]g . As a consequence, since this conclusion holds for any d 2 J1; DK, we ` `=1 have that  2 S = Grid(S ). Formulated in a slightly di erent way, we m d d=1 thus just proved that: ? 0 8r 2 Rnf0 g;8 2 nGrid(S ); max jha( );rij > jha();rij: (B.26) 0 ? 2Grid(S ) We are now ready to prove the statements of the theorem: ? ? ? Grid(S )-delayed recovery of any S  S . First, since S  Grid(S ), we have that y 2 R. Moreover, y 6= 0 since it results from a nontrivial linear com- bination of linearly independent atoms. Hence (B.26) holds and OMP selects a parameter in Grid(S ) at the rst iteration. Repeating the same argument at the next iterations, OMP selects parameters in Grid(S ) until the residual r vanishes. Now, because the atoms in fa() :  2 Grid(S )g are linearly in- dependent, r = 0 if and only if the set of parameters selected by OMP, say b b S , veri es S  S . Since OMP never selects twice the same parameter and ? ? S  Grid(S ), we thus achieve Grid(S )-delayed recovery of S . ? ? ? S -delayed recovery of any S  S . Since the elements of fa() :  2 Grid(S )g are linearly independent, (3.30-R-ERC) can equivalently be rewritten as (see e.g., [1, Prop. 3.15]): ? ? 0 8r 2 R nf0 g;8 2 Grid(S )nS ; maxjha( );rij > jha();rij; S H 0 ? 2S ? k where R , span(fa( )g ). Combining this result with (B.26) and using ` `=1 the fact that R ?  R lead to ? 0 8r 2 R ?nf0 g;8 2 nS ; maxjha( );rij > jha();rij: S H 0 ? 2S 38 Following the same arguments as above, we then have that: 1) OMP selects parameters in S until the residual r vanishes; 2) r = 0 if and only if the set of parameters selected by OMP veri es S  S . Since OMP never selects twice ? ? the same parameter and S  S , OMP thus achieves S -delayed recovery of S . Sharpness of the result. If (3.30-R-ERC) is not veri ed, we have from [1, Prop. 3.15] that there exists some y 2 R ?nf0 g such that S H bad max jha();y ij > maxjha();y ij: (B.27) bad bad ? ? ? 2Grid(S )nS 2S In other words, OMP with y as input selects some  2= S at the rst iteration. bad B.4. Proof of Lemma 4 D card(G) Let G = S = f g be an arbitrary Cartesian grid in R and d ` d=1 `=1 card(G) fc g  R be a set of card(G) coecients not all equal to 0. Consider `=1 d 2 J1; DK and  2 R such that  [d] = 0 and de ne 0 0 f : R ! R d + card(G) card(G) X X k +te  k 0 d ` t 7! c ( + te ;  ) = c e : (B.28) ` 0 d ` ` `=1 `=1 We assume that f is not identically zero as in the statement of De nition 7. f can be rewritten as card(G) p D p jt [d]j  j [j] [j]j ` 0 ` j=1;j6=d f (t) = c e : d ` `=1 card(G) Let q denote the number of distinct elements of f [d]g and suppose (up `=1 to some renumbering) that f [d]g are pairwise distinct. We note that q  1 `=1 because otherwise there is a contradiction with our hypothesis \f not identically zero". We can then rewrite f as jt [d]j f (t) = ec e (B.29) d ` `=1 jt [d]j where the terms proportional to e for (possibly) identical values of  [d] have been merged together, and the scalars ec take into account the constant terms in the exponentials that do not depend on t. 0 0 Let A = fa (t ) : t 2 Rg be a Generalized Laplace dictionary in dimension 1 (see De nition 5). Then, f (t) can also be interpreted as the inner product between atom a (t) 2 A and y , ec a ( [d]). Let S , f [d] : ce 6= 1 1 ` ` ` ` 1 `=1 1 0; ` = 1 : : : qg. Applying Theorem 3, we have that OMP with y as input 39 e e achieves exact card(S )-step recovery of S . In particular, this implies that OMP selects a parameter in S at the rst iteration, that is: 0 0 8t 2 RnS; f (t ) < max f (t): (B.30) d d t2S Hence, the maximizers of f belong to S  f [d]g  S . d ` d `=1 C. Miscellaneous C.1. Proof of Lemma 1 ? k Let r 2 H, k 2 N and assume that there exists k parameters f g ` `=1 k  ? and k nonzero coecients fc g  R such that r = c a( ). De ne ` ` `=1 `=1 ` function ' as ' :  ! R 7! jha();rij = c (;  ) ; (C.1) `=1 ` that is, the function involved in step 6 in Algorithm 1. We now prove that a maximizer of (C.1) exists. To that aim, we distinguish two cases. Case 1: 8 2 ; '() = 0. In that case, any parameter  2  is a maximizer of '. Case 2: 9 2 ; '( ) > 0. Denote " , '( ). We then have 0 0 0 sup '()  '( ) = " > 0: (C.2) Hence, by condition (3.6), there exists k compact sets fK g such that for all `=1 ` 2 J1; kK,  2 K and 0 ` c ? 8 2 K ; (;  ) < ": (C.3) ` ` jc j 0 ` ` =1 Note that the right-hand-side of (C.3) is well de ned: by positive-de niteness of h;i, r 6= 0 so k > 0 and jc j > 0 necessarily. De ne K = [ K . ` ` `=1 `=1 c k c Since K = \ K , we have using the triangular inequality `=1 ` c ? 8 2 K ; '()  jc j(;  ) < ": (C.4) `=1 See now that ' is continuous by continuity of , K is compact as a union of compact sets. Then, the extreme value theorem ensures that there exists such that '( )  '() for all  2 K . Lemma 1 follows by seeing that m m '( )  '() for all  by (C.4). 40 Case 1: contradiction Case 2 Case 3: contradiction u u u Figure C.2: Shape of P (see proof of Lemma 7) with constraints i) P is continuous, ii) P (u) < 0 and iii) 9u > 0 such that P (u) > 0 for all u > u . One see that the constraints 0 0 cannot be satis ed in cases 1 and 3. C.2. Proof of Lemma 7 The proof of Lemma 7 is based on the following result: Lemma 8 (Laguerre's generalization of Descartes's rule of signs [74, p. 319]). Let a ; : : : ; a be nonzero real coecients and 0 < x <  < x be real numbers. 1 k 1 k Let z be the number of real roots of the function P (u) = a x , and n be `=1 ` ` the number of changes in sign in the sequence of numbers a ; : : : ; a . Then 1 k z  n . The sequence of coecients a = c with ` 2 J1; kK has only two sign ` k+1` k+1` changes by hypothesis. By applying Lemma 8 with x = e , one sees that P has at most two real roots, so at most two sign changes on R . However, P must satisfy the following constraints: i) P is continuous on [0; +1[, ii) P (0) < 0, iii) there exists u > 0 such that P (u) > 0 for all u > u . 0 0 As illustrated in Figure C.2, these three constraints cannot be veri ed simulta- neously if P has 0 or 2 roots. Thus P has exactly one sign change on R and there exists u > 0 such that u < u =) P (u) < 0 and u > u =) P (u) > 0. 0 0 0 One then has, for any non-decreasing function f and any (non-negative) measure on R : f (u)P (u) d(u) Z Z u +1 = f (u) P (u) d(u) + f (u) P (u) d(u) |{z} |{z} |{z} |{z} 0 u non-decreasing 0 non-decreasing 0 Z Z u +1 f (u )P (u) d(u) + f (u )P (u) d(u) 0 0 0 u = f (u ) P (u) d(u): P(u) D. Details related to Example 4 Assume that ' is right di erentiable at 0. We rst prove by contradiction (1) (1) (1) that ' (0) < 0. Assume that ' (0) = 0. As ' is a CMF, ' (t)  0 for all (1) (1) t 2 R and ' is non-decreasing on R . It follows that ' is identically zero. + + Hence ' is constant and equal to '(0) = 1 which contradicts the assumption lim '(t) = 0. t!+1 Consider now the function de ned for all x  0 by f : x 7! (k 1)'(2x) k'(x) + 1. We note that f corresponds to the quantity involved in (3.24) with the substitution x $  . Since '(0) = 1, we have f (0) = 0. Moreover f is di erentiable for any x > 0 and (1) (1) (1) f (x) = 2(k 1)' (2x) k' (x) h i (1) ' (2x) (1) 1 = k' (x) 2 1 1 : (D.1) (1) ' (x) (1) ' (2x) Since ' is right di erentiable at 0, the ratio tends to 1 as x tends to 0 (1) ' (x) (1) 1 (remember that we proved that ' (0) < 0). Since k  3 we have 2 1 > 1, hence there exists x > 0 such that (1) ' (2x) x < x ) 2 1 1 > 0: (D.2) 0 (1) k ' (x) (1) (1) By Lemma 5, we have that ' (x) < 0 for all x > 0. Hence f (x) < 0 for x < x , that is f is decreasing on [0; x ]. Combining this result with f (0) = 0, 0 0 we deduce that (3.24) holds whenever  < x , that is the wrong parameter ? k 0 will be preferred to any of the f g . ` `=1 E. Exact recovery in higher dimensions - CMF kernel and k = 2 In this section, we elaborate on the notion of \axis admissibility" (see Def- inition 7) for general CMF kernels. We rst show (see Example 5) that there exist some Cartesian grids which are not axis admissible with respect to some CMF kernels. We then emphasize in the case k = 2 that the notion of \axis admissibility" is not necessary to achieve k-step recovery in CMF dictionaries. Example 5. Let A be a CMF dictionary with induced kernel  = '(kk ) 1+'( ) and consider  > 0, c = c = 1, c = c = and 1 2 3 4 p p '( )+'(2 ) 8 t 2 R ; f (t) = c (te ;  ) (E.1) 1 ` 1 ` `=1 where e the d-th canonical basis vector of R . 4 2 Let G , f g  R be a Cartesian grid with  = (0; 0) = 0 ,  = ` 1 2 2 `=1 (; 0) = e ,  = (0; ) = e ,  = (; ) = 1 . Simple algebraic 1 3 2 4 2 manipulations then show that f (0) = f () = 0 and 1 1 p p p '  + 1 + '( ) 1 p 2 f = 2'  1 : (E.2) 1 p 2 2 1 p p p '( ) + '(2 ) ' If ' and  are such that f 6= 0, one can conclude that the maximizers of f are distinct from 0 and . In view of De nition 7, this shows that G is not axis admissible with respect to A. For instance, this is the case for the CMF ' : x 7! (cf Example 2). 1+x Indeed, in this case (E.2) particularizes to 1 + 2 2 + f ( ) = 1 : (E.3) p p p 2  1+ 1 + 1 + 1 +  + p p p 2 1+2 2 As the factor inside the absolute value in the right-hand side is a non-zero rational function of x =  , we have f (=2) 6= 0 except possibly on a set of values of  which has Lebesgue measure equal to zero. Hence there exists  > 0 such that f (=2) > 0. We nally note that the construction presented here for the case D = 2 easily extends to D > 2 by zero-padding of the  's. ? ? We next show that k-step recovery of S with k = card(S ) = 2 may be possible in CMF dictionaries even when the axis admissibility assumption fails to hold. First, we state and prove a useful technical lemma: Lemma 9. Let  be a CMF kernel in dimension D in the sense of De nition 4. For any  ;  ;  2 R , the following result holds: 1 2 3 ( ;  ) ( ;  )  ( ;  ): (E.4) 1 2 2 3 1 3 Proof. By de nition, there exists a CMF ' such that (;) = ' kk and '(0) = 1. Since ' is nonnegative and decreasing, we have for all x; y  0 '(x + y)  '(x + y)'(x + y)  '(x)'(y): (E.5) p p Using this result with x = k  k and y = k  k , we have 1 2 2 3 p p p p ( ;  )( ;  )  ' k  k +k  k : (E.6) 1 2 2 3 1 2 2 3 p p p p Since the quasi-norm kk satis es a triangular inequality, we have k  k 1 3 p p p p k  k +k  k . As any CMF is decreasing, (E.4) follows. 1 2 2 3 p p We are now ready to state our recovery result: Lemma 10 (Exact recovery for CMF dictionaries when k = 2). Let A be a CMF dictionary in dimension D  1 with induced kernel . Consider a support ? ? ? ? ? 22 S = f ;  g where  6=  , and let G 2 R be the matrix de ned by 1 2 1 2 0 ? ? G[`; ` ] = ( ;  ). Assume that ` ` ? ? 1 8  2 Grid(S )nS ; G g < 1 (E.7) 2 ? where g 2 R is de ned by g [`] = (;  ) for ` = 1; 2. Then OMP achieves exact 2-step recovery of S . 43 Proof. By Lemma 6,  is admissible in the sense of De nition 1. We show below that since (E.7) holds, S is admissible with respect to  in the sense of De nition 2. Lemma 10 then follows from Theorem 2. Consider a non-empty subset of indices T  f1; 2g and t , card(T ). Let also fc g be such that c > 0 and c < 1. De ne ` `2T ` ` `2T : R ! R 7! c (;  ) (E.8) `2T We next show that items i) and ii) of De nition 2 are satis ed. Item i) of De nition 2. We distinguish two cases: • If t = 1, we can assume without loss of generality that T = f1g. Since ? ? ? (;  ) < 1 for all  6=  , one immediately sees that () = c (;  ) < 1 1 1 ? ? ? c = ( ) for all  6=  . Hence,  is the unique global maximizer of . 1 1 1 • If t = 2, let  be a maximizer of . We note that can also be written as () = jha();yij where y , c a( ) hence a maximizer always `=1 ` exists by virtue of Lemma 1. Since  maximizes the D-dimensional function , its d-th entry  [d] is a maximizer of the one-dimensional section of along the d-th canonical direction, denoted : : R ! R d + P P p p ? ? x 7! c ' jx  [d]j + j [j]  [j]j : (E.9) ` m `2T ` j6=d ` Applying the same reasoning as in the proof of Theorem 3 (see part of the proof dedicated to establishing \item i) of De nition 2"), we have (2) ? 2 8x 2= f [d]g : is twice di erentiable and (x) > 0. Hence, no ` `=1 ? 2 ? 2 x 2 f [d]g can be a maximizer and necessarily  [d] 2 f [d]g . ` `=1 ` `=1 Since this result is valid for all d 2 J1; DK, we nally have  2 Grid(S ). Therefore, since (E.7) holds, we have max () = max jha();yij ? ? ? ? 2Grid(S )nS 2Grid(S )nS (E.10) ? ? < maxjha( );yij = max ( ) ` ` ? ? ? ? 2S  2S Hence all maximizers of belong to S . Item ii) of De nition 2. From the working assumptions of item ii), the set T satis es T 6= ; and there exists ` 2 f1; 2gnT . Hence, we have T 6= f1; 2g, that is T is a singleton. We assume without loss of generality that T = f1g. ? ? ? ? Hence () = c (;  ) for some 0 < c < 1. If ( ) ( ;  )  0, then 1 1 1 1 1 1 2 ? ? ? ? ? ? ? c ( ;  ) = ( ) ( ;  )  0. Hence c  ( ;  ) and () 1 1 1 1 2 1 1 2 1 2 ? ? ? ? ? ( ;  )(;  ) for each . Using Lemma 9 with  = ,  =  ,  =  , we 1 2 3 1 2 1 1 2 obtain for each  2 : ? ? ? ? ? () (;  )  (;  )( ;  ) (;  )  0: 2 1 1 2 2 44 F. Table of notations Notation Comment General notations H, y (Hilbert) observation space and observation A;a() Dictionary A made of parametric atoms a Coherence between atoms of a support c 2 R Weighting coecients ;  Parameter set and element S , S Set of parameters G Cartesian grid k; ` Number of atoms, most frequent index Grid Set augmenter, see (3.27) ' CMF (see De nition 3) Kernel function   ! R K (D) Set of CMF kernels in dimension D CMF K (D) Set of Laplace kernels in dimension D Lap (n) f n-th derivative of function f e `-th element of the canonical basis E Expectation operator i Imaginary number Technical notations G, g Gram matrix related to a support S , columns of G g parametric vector related to a support S Vector of R for some k often de ned as u; v u; v = G g for some  2 Table F.1: Table of notations. 45 Acknowledgments Part of this work has been funded thanks to the Becose ANR project no. ANR-15-CE23-0021. References [1] S. Foucart, H. Rauhut, A Mathematical Introduction to Compressive Sens- ing, Birkh auser Basel, 2013. doi:10.1007/978-0-8176-4948-7. [2] C. Ekanadham, D. Tranchina, E. P. Simoncelli, Recovery of sparse translation-invariant signals with continuous basis pursuit, IEEE Trans- actions on Signal Processing 59 (10) (2011) 4735{4744. doi:10.1109/TSP. 2011.2160058. [3] E. J. Cand es, C. Fernandez-Granda, Towards a Mathematical Theory of Super-resolution, Communications on Pure and Applied Mathematics 67 (6) (2014) 906{956. doi:10.1002/cpa.21455. [4] V. Duval, G. Peyr e, Exact support recovery for sparse spikes deconvolution, Foundations of Computational Mathematics 15 (5) (2014) 1315{1355. doi: 10.1007/s10208-014-9228-6. [5] Y. C. Pati, R. Rezaiifar, P. S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposi- tion, in: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, 1993, pp. 40{44 vol.1. doi:10.1109/ACSSC.1993.342465. [6] K. S. Miller, S. G. Samko, Completely monotonic functions, Integral Trans- forms and Special Functions 12 (4) (2001) 389{402. [7] B. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing 24 (2) (1995) 227{234. doi:10.1137/ S0097539792240406. [8] A. J. Miller, Subset selection in regression, Chapman and Hall, London, [9] S. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing 41 (12) (1993) 3397{3415. doi: 10.1109/78.258082. [10] R. A. DeVore, V. N. Temlyakov, Some remarks on greedy algorithms, Advances in Computational Mathematics 5 (1) (1996) 173{187. doi: 10.1007/bf02124742. [11] E. Liu, V. N. Temlyakov, The Orthogonal Super Greedy Algorithm and Applications in Compressed Sensing, IEEE Transactions on Information Theory 58 (4) (2012) 2040{2047. doi:10.1109/TIT.2011.2177632. 46 [12] S. Chen, S. A. Billings, W. Luo, Orthogonal least squares methods and their application to non-linear system identi cation, International Journal of Control 50 (5) (1989) 1873{1896. doi:10.1080/00207178908953472. [13] J. H. Friedman, W. Stuetzle, Projection pursuit regression, Journal of the American Statistical Association 76 (376) (1981) 817{823. doi:10.1080/ 01621459.1981.10477729. [14] P. J. Huber, Projection pursuit, The Annals of Statistics 13 (2) (1985) 435{475. doi:10.1214/aos/1176349519. [15] L. Rebollo-Neira, D. Lowe, Optimized orthogonal matching pursuit ap- proach, IEEE Signal Processing Letters 9 (4) (2002) 137{140. doi: 10.1109/LSP.2002.1001652. [16] V. N. Temlyakov, Greedy approximation, Acta Numerica 17 (2008) 235{ 409. doi:10.1017/s0962492906380014. [17] P. Vincent, Y. Bengio, Kernel Matching Pursuit, Machine Learning 48 (1/3) (2002) 165{187. doi:10.1023/a:1013955821559. [18] J. F. Claerbout, F. Muir, Robust modeling with erratic data, Geophysics 38 (5) (1973) 826{844. doi:10.1190/1.1440378. [19] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scienti c Computing 20 (1) (1998) 33{61. doi:10.1137/ S1064827596304010. [20] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological) 58 (1) (1996) 267{ URL http://www.jstor.org/stable/2346178 [21] R. Tibshirani, I. Johnstone, T. Hastie, B. Efron, Least angle regres- sion, The Annals of Statistics 32 (2) (2004) 407{499. doi:10.1214/ [22] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences 2 (1) (2009) 183{202. doi:10.1137/080716542. [23] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed opti- mization and statistical learning via the alternating direction method of multipliers, Foundations and Trends® in Machine Learning 3 (1) (2011) 1{122. doi:10.1561/2200000016. [24] R. Gribonval, M. Nielsen, Beyond sparsity: Recovering structured repre- sentations by $fnellg^1$ minimization and greedy algorithms, Advances in Computational Mathematics 28 (1) (2008) 23{41. doi:10.1007/ s10444-005-9009-5. 47 [25] L. Borup, R. Gribonval, M. Nielsen, Beyond coherence: Recovering struc- tured time{frequency representations, Applied and Computational Har- monic Analysis 24 (1) (2008) 120{128. doi:10.1016/j.acha.2007.09. [26] K. C. Knudson, J. Yates, A. Huk, J. W. Pillow, Inferring sparse represen- tations of continuous signals with continuous orthogonal matching pursuit, Advances in Neural Information Processing Systems 27 (2014) 1215{1223. [27] A. Eftekhari, M. B. Wakin, Greed is super: A fast algorithm for super- resolution (2015). arXiv:1511.03385. [28] C. Dor er, A. Dr emeau, C. Herzet, Ecient atom selection strategy for iterative sparse approximations, in: iTWIST 2018 - International Traveling Workshop on Interactions between low-complexity data models and Sensing Techniques, Marseille, France, 2018, pp. 1{3. URL https://hal.inria.fr/hal-01937501 [29] K. Bredies, H. K. Pikkarainen, Inverse problems in spaces of measures, ESAIM: Control, Optimisation and Calculus of Variations 19 (1) (2012) 190{218. doi:10.1051/cocv/2011205. [30] Y. de Castro, F. Gamboa, Exact reconstruction using Beurling minimal extrapolation, Journal of Mathematical Analysis and Applications 395 (1) (2012) 336 { 354. doi:10.1016/j.jmaa.2012.05.011. [31] G. Tang, B. N. Bhaskar, P. Shah, B. Recht, Compressed sensing o the grid, IEEE Transactions on Information Theory 59 (11) (2013) 7465{7490. doi:10.1109/tit.2013.2277451. [32] Y. D. Castro, F. Gamboa, D. Henrion, J.-B. Lasserre, Exact solutions to super resolution on semi-algebraic domains in higher dimensions, IEEE Transactions on Information Theory 63 (1) (2017) 621{630. doi:10.1109/ tit.2016.2619368. [33] N. Boyd, G. Schiebinger, B. Recht, The alternating descent conditional gra- dient method for sparse inverse problems, in: 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Process- ing (CAMSAP), 2015, pp. 57{60. doi:10.1109/CAMSAP.2015.7383735. [34] P. Catala, V. Duval, G. Peyr e, A low-rank approach to o -the-grid sparse deconvolution, SIAM Journal on Imaging Sciences 12 (3) (2019) 1464{1500. doi:10.1137/19M124071X. [35] Q. Denoyelle, V. Duval, G. Peyr e, E. Soubies, The sliding frank{wolfe algo- rithm and its application to super-resolution microscopy, Inverse Problems 36 (1) (2019) 014001. doi:10.1088/1361-6420/ab2a29. 48 [36] A. Eftekhari, A. Thompson, Sparse inverse problems over measures: Equiv- alence of the conditional gradient and exchange methods, SIAM Journal on Optimization 29 (2) (2019) 1329{1349. doi:10.1137/18m1183388. [37] A. Flinth, F. de Gournay, P. Weiss, On the linear convergence rates of exchange and continuous methods for total variation minimization (Jun 2019). arXiv:1906.09919. [38] L. Chizat, F. Bach, On the global convergence of gradient descent for over- parameterized models using optimal transport, in: Advances in Neural Information Processing Systems 31, 2018, pp. 3036{3046. [39] L. Chizat, Sparse Optimization on Measures with Over-parameterized Gra- dient Descent, working paper or preprint (July 2019). URL https://hal.archives-ouvertes.fr/hal-02190822 [40] G. de Prony, Essai experimental et analytique : sur les lois de la dilatabilite des uides elastique et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, a di erentes temperatures, 1795, journal de l'Ecole Polytechnique. [41] S. Kunis, T. Peter, T. R omer, U. von der Ohe, A multivariate generalization of Prony's method, Linear Algebra and its Applications 490 (2016) 31{47. doi:10.1016/j.laa.2015.10.023. [42] W. Liao, A. Fannjiang, MUSIC for single-snapshot spectral estimation: Stability and super-resolution, Applied and Computational Harmonic Anal- ysis 40 (1) (2016) 33{67. doi:10.1016/j.acha.2014.12.003. [43] R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques, IEEE Transactions on Acoustics, Speech, and Signal Processing 37 (7) (1989) 984{995. doi:10.1109/29.32276. [44] X. Wei, P. L. Dragotti, FRESH|FRI-based single-image super-resolution algorithm, IEEE Transactions on Image Processing 25 (8) (2016) 3723{ 3735. doi:10.1109/tip.2016.2563178. [45] J. A. Tropp, Greed is good: Algorithmic results for sparse approximation, IEEE Transactions on Information Theory 50 (10) (2004) 2231{2242. doi: 10.1109/TIT.2004.834793. [46] C. Soussen, R. Gribonval, J. Idier, C. Herzet, Joint k-Step Analysis of Orthogonal Matching Pursuit and Orthogonal Least Squares, IEEE Trans- actions on Information Theory 59 (5) (2013) 3158{3174. doi:10.1109/ tit.2013.2238606. [47] R. Gribonval, P. Vandergheynst, On the exponential convergence of match- ing pursuits in quasi-incoherent dictionaries, IEEE Transactions on Infor- mation Theory 52 (1) (2006) 255{261. doi:10.1109/tit.2005.860474. 49 [48] J.-J. Fuchs, On sparse representations in arbitrary redundant bases, IEEE Transactions on Information Theory 50 (6) (2004) 1341{1344. doi:10. 1109/TIT.2004.828141. [49] J. A. Tropp, Just relax: convex programming methods for identifying sparse signals in noise, IEEE Transactions on Information Theory 52 (3) (2006) 1030{1051. doi:10.1109/TIT.2005.864420. [50] C. Herzet, A. Dr emeau, C. Soussen, Relaxed Recovery Conditions for OMP/OLS by Exploiting Both Coherence and Decay, IEEE Transactions on Information Theory 62 (1) (2016) 459{470. doi:10.1109/TIT.2015. [51] S. Huang, J. Zhu, Recovery of sparse signals using OMP and its variants: convergence analysis based on RIP, Inverse Problems 27 (3) (2011) 035003. doi:10.1088/0266-5611/27/3/035003. [52] R. Maleh, Improved RIP Analysis of Orthogonal Matching Pursuit, Tech. rep. (2011). arXiv:1102.4311. [53] Q. Mo, Y. Shen, A Remark on the Restricted Isometry Property in Orthog- onal Matching Pursuit, IEEE Transactions on Information Theory 58 (6) (2012) 3654{3656. doi:10.1109/TIT.2012.2185923. [54] J. Wang, B. Shim, On the Recovery Limit of Sparse Signals Using Orthog- onal Matching Pursuit, IEEE Transactions on Signal Processing 60 (9) (2012) 4973{4976. doi:10.1109/TSP.2012.2203124. [55] L. Chang, J. Wu, An Improved RIP-Based Performance Guarantee for Sparse Signal Recovery via Orthogonal Matching Pursuit, IEEE Transac- tions on Information Theory 60 (9) (2014) 5702{5715. doi:10.1109/TIT. 2014.2338314. [56] J. Wen, X. Zhu, D. Li, Improved bounds on restricted isometry constant for orthogonal matching pursuit, Electronics Letters 49 (23) (2013) 1487{1489. doi:10.1049/el.2013.2222. [57] Q. Mo, A Sharp Restricted Isometry Constant Bound of Orthogonal Match- ing Pursuit, Tech. rep. (2015). arXiv:1501.01708. [58] W. Rudin, Principles of Mathematical Analysis, 3rd Edition, Inter- national Series in Pure and Applied Mathematics, McGraw-Hill Sci- ence/Engineering/Math, 1976. [59] D. D. Carlo, C. Elvira, A. Deleforge, N. Bertin, R. Gribonval, Blaster: An o -grid method for blind and regularized acoustic echoes retrieval, in: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 156{160. 50 [60] J.-M. Aza s, Y. de Castro, F. Gamboa, Spike detection from inaccurate samplings, Applied and Computational Harmonic Analysis 38 (2) (2015) 177{195. doi:10.1016/j.acha.2014.03.004. [61] C. Fernandez-Granda, Super-resolution of point sources via convex pro- gramming, Information and Inference: A Journal of the IMA 5 (3) (2016) 251{303. doi:10.1093/imaiai/iaw005. [62] M. Kre n, A. Nudel0man, The Markov Moment Problem and Extremal Problems, American Mathematical Society, 1977. doi:10.1090/mmono/ [63] Q. Denoyelle, V. Duval, G. Peyr e, Support Recovery for Sparse Super- Resolution of Positive Measures, Journal of Fourier Analysis and Applica- tions 23 (5) (2017) 1153{1194. [64] C. Poon, G. Peyr e, MultiDimensional sparse super-resolution, SIAM Journal on Mathematical Analysis 51 (1) (2019) 1{44. doi:10.1137/ 17m1147822. [65] C. J. Hillar, L.-H. Lim, Most tensor problems are np-hard, J. ACM 60 (6). doi:10.1145/2512329. URL https://doi.org/10.1145/2512329 [66] C. Elvira, J. E. Cohen, C. Herzet, R. Gribonval, Continuous dictionaries meet low-rank tensor approximations, in: iTwist 2020 - International Trav- eling Workshop on Interactions between low-complexity data models and Sensing Techniques, Nantes, France, 2020, pp. 1{3. URL https://hal.archives-ouvertes.fr/hal-02567115 [67] V. Chandrasekaran, B. Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Foundations of Computational Math- ematics 12 (6) (2012) 805{849. doi:10.1007/s10208-012-9135-7. URL https://doi.org/10.1007/s10208-012-9135-7 [68] H. Wendland, Scattered data approximation, 2005. doi:10.2277/ [69] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, USA, 2004. [70] D. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas with Ap- plication to Linear Systems Theory, Princeton University Press, 2005. [71] G. P olya, Remarks on Characteristic Functions, in: Proceedings of the [First] Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, Calif., 1949, pp. 115{123. URL https://projecteuclid.org/euclid.bsmsp/1166219202 51 [72] D. V. Widder, Laplace Transform, hardcover Edition, Princeton Mathe- matical Press, 1941. [73] D. Bertsekas, Nonlinear Programming, 2nd Edition, Athena Scienti c, [74] H. Fejzi c, C. Freiling, D. Rinne, Descartes' rule of signs, alternations of data sets, and balanced di erences, The American Mathematical Monthly 116 (4) (2009) 316{327. URL http://www.jstor.org/stable/40391091

Journal

MathematicsarXiv (Cornell University)

Published: Apr 12, 2019

There are no references for this article.