Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

How many Laplace transforms of probability measures are there?

How many Laplace transforms of probability measures are there? PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 138, Number 12, December 2010, Pages 4331–4344 S 0002-9939(2010)10448-3 Article electronically published on May 24, 2010 HOW MANY LAPLACE TRANSFORMS OF PROBABILITY MEASURES ARE THERE? FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER (Communicated by Richard C. Bradley) Abstract. A bracketing metric entropy bound for the class of Laplace trans- forms of probability measures on [0, ∞) is obtained through its connection with the small deviation probability of a smooth Gaussian process. Our results for the particular smooth Gaussian process seem to be of independent interest. 1. Introduction Let μ be a finite measure on [0, ∞). The Laplace transform of μ is a function on (0, ∞) defined by −ty (1) f(t)= e μ(dy). n (n) It is easy to check that such a function has the property that (−1) f (t) ≥ 0for all nonnegative integers n and all t> 0. A function on (0, ∞) with this property is called a completely monotone function on (0, ∞). A characterization due to Bernstein (cf. Williamson (1956)) says that f is completely monotone on (0, ∞)if and only if there is a nonnegative measure μ (not necessary finite) on [0, ∞)such that (1) holds. Therefore, due to monotonicity, the class of Laplace transforms of finite measures on [0, ∞) is the same as the class of bounded completely monotone functions on (0, ∞). These functions can be extended to continuous functions on [0, ∞), and we will call them completely monotone on [0, ∞). Completely monotonic functions have remarkable applications in various fields, such as probability and statistics, physics and potential theory. The main properties of these functions are given in Widder (1941), Chapter IV. For example, the class of completely monotonic functions is closed under sums, products and pointwise convergence. We refer to Alzer and Berg (2002) for a detailed list of references on completely monotonic functions. Closely related to the class of completely mono- tonic functions are the so-called k-monotone functions, where the nonnegativity of n (n) (−1) f is required for all integers n ≤ k. In fact, completely monotonic functions can be viewed as the limiting case of the k-monotone functions as k →∞.In this Received by the editors September 15, 2009 and, in revised form, February 2, 2010. 2010 Mathematics Subject Classification. Primary 46B50, 60G15, 60G52; Secondary 62G05. Key words and phrases. Laplace transform, bracketing metric entropy, completely monotone functions, smooth Gaussian process, small deviation probability. The second author was supported in part by NSF grant DMS-0805929. The third author was supported in part by NSF Grant DMS-0804587 and NIH/NIAID Grant 5 R37 A1029168. c 2010 American Mathematical Society Reverts to public domain 28 years from publication 4331 4332 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER sense, the present work is a partial extension of Gao (2008) and Gao and Wellner (2009). Let M be the class of completely monotone functions on [0, ∞)that are bounded by 1. Then −tx M = f :[0, ∞) → [0, ∞) f(t)= e μ(dx), μ≤ 1 . It is well known (see e.g. Feller (1971), Theorem 1, page 439) that the subclass of M with f(0) = 1 corresponds exactly to the Laplace transforms of the class of probability measures μ on [0, ∞). For a random variable with distribution function F (t)= P (X ≤ t), define the survival function S(t)= 1 − F (t)= P (X> t). Thus the class −tx S = S :[0, ∞) → [0, ∞) S(t)= e μ(dx), μ =1 is exactly the class of survival functions of all scale mixtures of the standard expo- −t nential distribution (with survival function e ), with corresponding densities −xt p(t)= −S (t)= xe μ(dx),t ≥ 0. It is easily seen that the class P of such densities with p(0) < ∞ is also a class of completely monotone functions corresponding to probability measures μ on [0, ∞) with finite first moment. These classes have many applications in statistics; see e.g. Jewell (1982) for a brief survey. Jewell (1982) considered nonparametric estimation of a completely monotone density and showed that the nonparametric maximum likelihood estimator (or MLE) for this class is almost surely consistent. The brack- eting entropy bounds derived below can be considered as a first step toward global rates of convergence of the MLE. In probability and statistical applications, one way to understand the complexity of a function class is by way of the metric entropy for the class under certain common distances. Recall that the metric entropy of a function class F under distance ρ is defined to be log N(ε, F,ρ)where N(ε, F,ρ) is the minimum number of open balls of radius ε needed to cover F. In statistical applications, sometimes bracketing metric entropy is needed. Recall that bracket entropy is defined as log N (ε, F,ρ), [] where N (ε, F,ρ):=min n : ∃f , f ,...,f , f s.t. ρ(f ,f ) ≤ ε,F⊂ [f , f ] [] 1 n k k 1 n k k k=1 and [f , f ]= g ∈F : f ≤ g ≤ f . k k k k Clearly N(ε, F,ρ) ≤ N (ε, F,ρ), and they are closely related in our setting below. [] In this paper, we study the metric entropy of M under the L (ν)-norm given by f = |f(x)| ν(dx), 1 ≤ p ≤∞, L (ν) where ν is a probability measure on [0, ∞). Our main result is the following. HOW MANY LAPLACE TRANSFORMS ARE THERE? 4333 Theorem 1.1. (i) Let ν be a probability measure on [0, ∞). There exists a constant C depending only on p ≥ 1 such that for any 0 <ε< 1/4, log N (ε, M ,· p ) ≤ C log(Γ/γ) ·| log ε| , [] L (ν) −p p for any 0 <γ < Γ < ∞ such that ν([γ, Γ]) ≥ 1 − 4 ε . In particular, if there K −K −p p exists a constant K> 1, such that ν([ε ,ε ]) ≥ 1 − 4 ε ,then log N (ε, M ,·  p ) ≤ CK| log ε| . [] ∞ L (ν) (ii) If ν is Lebesgue measure on [0, 1],then log N (ε, M ,· 2 ) log N(ε, M ,·  2 ) | log ε| , [] ∞ L (ν) ∞ L (ν) where A B means that there exist universal constants C ,C > 0 such that 1 2 C A ≤ B ≤ C B. 1 2 As an equivalent result for part (ii) of the above theorem, we have the following important small deviation probability estimates for an associated smooth Gaussian process. In particular, it may be of interest to find a probabilistic proof for the lower bound directly. Theorem 1.2. Let Y (t), t> 0, be a Gaussian process with covariance E Y (t)Y (s)= −t−s (1 − e )/(t + s). Then for 0 <ε< 1, log P sup |Y (t)| <ε −| log ε| . t>0 The rest of the paper is organized as follows. In Section 2, we provide the upper bound estimate in the main result by explicit construction. In Section 3, we summarize various connections between entropy numbers of a set (and its convex hull) and small ball probabilities for the associated Gaussian process. Some of our observations in a general setting are stated explicitly for the first time. Finally we identify the particular Gaussian process suitable for our entropy estimates. Then in Section 4, we obtain the required upper bound small ball probability estimate (which implies the lower bound entropy estimate as discussed in section 3) by a simple determinant estimate. This method of small ball estimates is made explicit here for the first time and can be used in many more problems. The technical determinant estimates are also of independent interest. 2. Upper bound estimate In this section, we provide an upper bound for N (ε, M ,· ), where ν [] ∞ L (ν) is a probability measure on [0, ∞)and 1 ≤ p ≤∞. Beforewestart, wenotethat M is the convex hull of K := {K(t, ·): t ∈ [0, ∞)},where for each t ∈ [0, ∞), −tx K(t, ·)isafunction on [0, ∞) defined by K(t, x)= e . There are some general results on metric entropy of convex hulls conv(T ) using the metric entropy of T . (Cf. Dudley (1987), Ball and Pajor (1990), van der Vaart and Wellner (1996), Carl (1997), Carl et al. (1999), Li and Linde (2000), Gao (2004), etc.) For example, −α Carl et al. (1999) proved that if N(ε, T,·)= O(ε ), α> 0, then −2α/(2+α) log N(ε, conv(T ),· )= O(ε ), where · is any Banach space norm. Although these results are best possible for the general case, when applied to specific problems they could be far from being sharp. This is especially the case when the metric entropy of T grows at a polynomial rate. −kx For example, in our case, because the functions e , k =1, 2,...,n, have mutual 4334 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER 2 1 −3/2 −2/3 L [0, 1]-distance at least n , we immediately have N(ε, K, · ) ≥ Cε . L [0,1] Thus, in the case p = 2 and with ν taken to be Lebesgue measure on [0, 1], the best upper bound we can hope to obtain using the general convex hull result quoted above is −1/2 log N(ε, M ,· 2 ) ≤ C ε , ∞ L (ν) 2 which is much larger (at least in the dependence on ) than the upper bound log N(ε, M ,· 2 ) ≤ C| log ε| , L (ν) which we will obtain later in the section. We will obtain our upper bound estimate by an explicit construction of ε-brackets under L (ν)-distance. For each 0 <ε< 1/4, we choose γ> 0and Γ=2 γ,where m is a positive −p p integer such that ν([γ, Γ]) ≥ 1 − 4 ε . Weuse thenotation I(a ≤ t<b)todenote the indicator function of the interval [a, b). Now for each f ∈M , we first write in block form i−1 i f(t)= I(0 ≤ t<γ)f(t)+ I(t ≥ Γ)f(t)+ I(2 γ ≤ t< 2 γ)f(t). i=1 i−1 i Then for each block 2 γ ≤ t< 2 γ, we separate the integration limits at the level 2−i −u 2 | log ε|/γ and use the first N terms of Taylor’s series expansion of e with error terms associated with ξ = ξ ,0 ≤ ξ ≤ 1, to rewrite u,N f(t)= I(0 ≤ t<γ)f(t)+ I(t ≥ Γ)f(t)+ (p (t)+ q (t)+ r (t)), i i i i=1 where 2−i n n 2 | log ε|/γ (−1) t i−1 i n p (t)=: I(2 γ ≤ t< 2 γ) x μ(dx), n! n=0 2−i 2 | log ε|/γ N+1 (−ξtx) i−1 i q (t)=: I(2 γ ≤ t< 2 γ) μ(dx), (N +1)! i−1 i −tx r (t)=: I(2 γ ≤ t< 2 γ) e μ(dx). 2−i 2 | log ε|/γ We choose the integer N so that 2 2 (2) 4e | log ε|− 1 ≤ N< 4e | log ε|. Then, by using the inequality k! ≥ (k/e) and the fact that 0 <ξ < 1, we have i−1 i within the block 2 γ ≤ t< 2 γ, 2−i 2 | log ε|/γ N+1 (tx) |q (t)|≤ μ(dx) (N +1)! N+1 N+1 |4log ε| 4e| log ε| −(N+1) 4e ≤ ≤ ≤ e ≤ ε , (N +1)! N +1 i 2−i whereweused tx ≤ 2 γ · 2 | log ε|/γ =4| log ε| in the second inequality above. This implies, due to disjoint supports of q (t), 4e (3)  q (t) ≤ ε . i=1 HOW MANY LAPLACE TRANSFORMS ARE THERE? 4335 i−1 2−i −1 −tx 2 Next, we notice that for t ≥ 2 γ and x ≥ 2 γ | log ε|, e ≤ ε .Thus m m i−1 i 2 2 (4) r (t) ≤ I(2 γ ≤ t< 2 γ) ε μ(dx) ≤ ε . 2−i −1 2 γ | log ε| i=1 i=1 −p p Finally, because |f|≤ 1and ν([0,γ)) + ν([Γ, ∞)) ≤ 4 ε ,wehave I f(t)+ I f(t) ≤ ε/4. 0≤t<γ t≥Γ p L (ν) Together with (3) and (4), we see that the set m m R =: q (t)+ r (t)+ I(t< γ)f(t)+ I(t ≥ Γ)f(t): f ∈M i i ∞ i=1 i=1 p 2 4e has diameter in L (ν)-distance at most ε + ε + ε/4 <ε/2. Therefore, if we denote P = {p (t): f ∈M }, then the expansion of f above i i ∞ implies that M ⊂ P + R, and consequently, we have ∞ i i=1 p p N (ε, M ,· ) ≤ N ε/2, P ,· . [] ∞ L (ν) [] i L (ν) i=1 For any 1 ≤ i ≤ m and any p ∈P , we can write i i i−1 i n −i −1 n (5) p (t)= I(2 γ ≤ t< 2 γ) (−1) a (2 γ t) , i ni n=0 where 0 ≤ a ≤|4log ε| /n!. Now we can construct ni i−1 i n −i −1 n p = I(2 γ ≤ t< 2 γ) (−1) b (2 γ t) , ni n=0 i−1 i n −i −1 n p = I(2 γ ≤ t< 2 γ) (−1) c (2 γ t) , ni n=0 where ⎧ ⎧ n+2 n+2 ε 2 a ε 2 a ni ni ⎪ ⎪ if n is even,  if n is even, ⎨ ⎨ n+2 n+2 2 ε 2 ε b = c = ni ni ⎪ n+2 ⎪ n+2 ⎩ ε 2 a ⎩ ε 2 a ni ni if n is odd, if n is odd. n+2 n+2 2 ε 2 ε Clearly, p (t) ≤ p (t) ≤ p (t), and i i i−1 i −i −1 n |p − p|≤ I(2 γ ≤ t< 2 γ) |c − b |(2 γ t) ni ni n=0 i−1 i −i −1 n ≤ I(2 γ ≤ t< 2 γ) (2 γ t) n+2 n=0 i−1 i ≤ I(2 γ ≤ t< 2 γ). Hence m m m m p ≤ p ≤ p ≤ p + ε/2. i i i=1 i=1 i=1 i=1 4336 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER That is, the sets m m P =: p : p ∈P , 1 ≤ i ≤ m and P =: p : p ∈P , 1 ≤ i ≤ m i i i i i=1 i=1 ∞ p form ε/2bracketsof P in the L -norm, and thus in the L (ν)-norm for all i=1 1 ≤ p< ∞. Now we count the number of different realizations of P and P. Note that, due to the uniform bound on a in (5), there are no more than ni n+1 n 2 |4log ε| · +1 ε n! realizations for b . So, the number of realizations of p is bounded by ni n+1 n 2 |4log ε| · +1 . ε n! n=0 Because n! > (n/e) , for all 1 ≤ n ≤ N,wehave n+1 n 2 |4log ε| 3 8e| log ε| · +1 ≤ . ε n! ε n Thus, the number of realizations of p is bounded by N+1 · exp (n log |8e log ε|− n log n) n=1 N+1 3 N(N +1) ≤ · exp log |8e log ε|− x log xdx ε 2 N+1 2 2 3 N(N +1) N N ≤ · exp log |8e log ε|− log N + ε 2 2 4 ≤ exp C| log ε| for some absolute constant C, where in the last inequality we used the bounds on N given in (2). Hence the total number of realizations of P is bounded by exp Cm| log ε| . A similar estimate holds for the total number of realizations of P, and we finally obtain log N (ε, M ,· p ) ≤ C m| log ε| [] ∞ L (ν) for some different constant C . This finishes the proof since m =log (Γ/γ). 3. Entropy of convex hulls A lower bound estimate of metric entropy is typically difficult, because it often involves a construction of a well-separated set of maximal cardinality. Thus we introduce some soft analytic arguments to avoid this difficulty and change the problem into a familiar one in this section. The hard estimates are given in the next section. First note that M is just the convex hull of the functions k (·), 0 <s< ∞, ∞ s −ts where k (t)= e . We recall a general method to bound the entropy of convex s HOW MANY LAPLACE TRANSFORMS ARE THERE? 4337 hulls that was introduced in Gao (2004). Let T be a set in R or in a Hilbert space. The convex hull of T can be expressed as ∞ ∞ conv(T)= a t : t ∈ T, a ≥ 0,n ∈ N, a =1 , n n n n n n=1 n=1 while the absolute convex hull of T is defined by ∞ ∞ abconv(T)= a t : t ∈ T, n ∈ N, |a |≤ 1 . n n n n n=1 n=1 Clearly, by using probability measures and signed measures, we can express conv(T)= tμ(dt): μ is a probability measure on T ; abconv(T)= tμ(dt): μ is a signed measure on T, μ ≤ 1 . TV The following is clear: conv(T ) ⊂ abconv(T ) ⊂ conv(T ) − conv(T ). Therefore, for any norm ·, N(ε, conv(T ),· ) ≤ N(ε, abconv(T ),· ) ≤ [N(ε/2, conv(T ),· )] . In particular, at the logarithmic level, the two entropy numbers are comparable, modulo constant factors on ε. The benefit of using the absolute convex hull is that it is symmetric and can be viewed as the unit ball of a Banach space, which allows us to use the following duality lemma of metric entropy: there exist constants c , c , K , K > 0 such that for all ε> 0, 2 1 2 K log N(c ε, abconv(T ),·  ) ≤ log N(ε, B,· ) 1 1 2 T ≤ K log N(c ε, abconv(T ),·  ), 2 2 2 where B is the unit ball of the dual norm of ·,and ·  is the norm induced by T,thatis, x := sup |t, x|=sup |t, x|. t∈T t∈abconv(T ) Strictly speaking, the duality lemma remains a conjecture in the general case. However, when the norm ·  is a Hilbert space norm, this has been proved; see Tomczak-Jaegermann (1987), Bourgain et al. (1989), and Artstein et al. (2004). A striking relation discovered by Kuelbs and Li (1993) says that the entropy number log N(ε, B,· ) is determined by the Gaussian measure of the set D =: {x ∈ H : x ≤ ε} ε T under some very weak regularity assumptions. For details, see Kuelbs and Li (1993), Li and Linde (1999), and also Corollary 2.2 of Aurzada et al. (2009). Using this relation, we can now summarize the connection between the metric entropy of convex hulls and the Gaussian measure of D as follows: Proposition 3.1. Let T be a precompact set in a Hilbert space. For α> 0 and β ∈ R, there exists a constant C > 0 such that for all 0 <ε< 1, −α β log P (D ) ≤−C ε | log ε| ε 1 4338 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER if and only if there exists a constant C > 0 such that for all 0 <ε< 1, 2α 2β 2+α 2+α log N(ε, conv(T ),· ) ≥ C ε | log ε| ; and for β> 0 and γ ∈ R, there exists a constant C > 0 such that for all 0 <ε< 1, β γ log P (D ) ≤−C | log ε| (log | log ε|) ε 3 if and only if there exists a constant C > 0 such that for all 0 <ε< 1, β γ log N(ε, conv(T ),·  ) ≥ C | log ε| (log | log ε|) . 2 4 Furthermore, the results also hold if the directions of the inequalities are switched. The result of this proposition can be implicitly seen in Gao (2004), where an explanation of the relation between N(ε, B, · ) and the Gaussian measure of D T ε is also given. Perhaps the most useful case of Proposition 3.1 is when T is a set of functions: K(t, ·), t ∈ T,where foreachfixed t ∈ T , K(t, ·) is a function in L (Ω), and where Ω is a bounded set in R , d ≥ 1. For this special case, we have Corollary 3.2. Let X(t)= K(t, x)dB(x), t ∈ T,where K(t, ·) are square- integrable functions on a bounded set Ω in R , d ≥ 1,and B(x) is the d-dimensional Brownian sheet on Ω.If F is the convex hull of the functions K(t, ·), t ∈ T,then −α β log P sup |X(t)| <ε −ε | log ε| t∈T for α> 0 and β ∈ R if and only if 2α 2β 2+α 2+α log N(ε, F,· ) ε | log ε| ; and for β> 0 and γ ∈ R, β γ log P sup |X(t)| <ε −| log ε| (log | log ε|) t∈T if and only if β γ log N(ε, F,·  ) | log ε| (log | log ε|) . The authors found this corollary especially useful. For example, it was used in Blei et al. (2007) and Gao (2008) to change a problem of metric entropy into a problem of small deviation probability of a Gaussian process, which is relatively easier. The proof is given in Gao (2008) for the case Ω = [0, 1], and in Blei et al. (2007) for the case [0, 1] . For the general case, it can be proved as easily. Indeed, the only thing we need to prove is that P(D ) can be expressed as the probability of the set sup |X(t)| <ε. We outline a proof below. Let φ be an orthonormal t∈T basis of L (Ω). Then X(t)= K(t, s)dB(s)= ξ K(t, s)φ (s)ds, n n Ω Ω n=1 HOW MANY LAPLACE TRANSFORMS ARE THERE? 4339 where ξ are i.i.d standard normal random variables. Thus, P(D )= P g ∈ L (Ω) : f(s)g(s)ds <ε,f ∈F = P g ∈ L (Ω) : K(t, s)g(s)dsμ(dt) <ε, μ ≤ 1 TV T Ω ∞ ∞ ∞ = P a φ (s): a < ∞, a K(t, s)φ (s)dsμ(dt) <ε, n n n n T Ω n=1 n=1 n=1 μ ≤ 1 TV ∞ ∞ ∞ = P a φ (s): a < ∞, sup a K(t, s)φ (s)ds <ε n n n n t∈T n=1 n=1 n=1 = P sup |X(t)| <ε . t∈T Now returning to our problem to estimate log N(ε, M ,· ) in the statement ∞ 2 (ii) of Theorem 1.1, where ·  is the L -norm with respect to Lebesgue measure on [0, 1], we notice that M is the convex hull of the functions K(t, ·), t ∈ [0, ∞), −ts on [0, 1], with K(t, s)= e . Clearly, for each fixed t, K(t, ·) is a square-integrable function on the bounded set [0, 1]. Now, for this K, the corresponding X(t)is a Gaussian process on [0, ∞) with covariance −t−s 1 − e (6) E X(t)X(s)= ,s,t ≥ 0. t + s Thus, the problem becomes how to sharply bound the probability P sup |X(t)| <ε . t∈(0,1] This will be done in the next section. 4. Lower bound estimate Let X(t), t ≥ 0 be the centered Gaussian process defined in (6). Our goal in this section is to prove that log P(sup |X(t)| <ε) ≤−C| log ε| , t≥0 for some constant C> 0. Note that for any sequence of positive numbers {δ } , i=1 P sup |X(t)| <ε ≤ P(max |X(δ )| <ε) 1≤i≤n t≥0 −n/2 −1/2 −1 =(2π) (det Σ) exp −y, Σ y dy ··· dy 1 n (7) max |y |≤ε 1≤i≤n i −n/2 −1/2 n ≤ (2π) (det Σ) (2ε) n −1/2 ≤ ε (det Σ) , 4340 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER wherewe use thecovariancematrix −δ −δ i j 1 − e Σ=(EX(δ )X(δ )) = . i j 1≤i,j≤n δ + δ i j 1≤i,j≤n To find a lower bound for det(Σ), we need the following lemma: Lemma 4.1. If 0 <b <a for all 1 ≤ i, j ≤ n,then ij ij kl det(a − b ) ≥ det(a ) − max · per(a ), ij ij ij ij 1≤l≤n a kl k=1 where per(a ) is the permanent of the matrix (a ). ij ij Proof. For notational simplicity, we denote c = a − b .Then ij ij ij det(a − b ) − det(a ) ij ij ij σ σ (−1) c c c − (−1) a a ··· a 1,σ(1) 2,σ(2) n,σ(n) 1,σ(1) 2,σ(2) n,σ(n) σ σ = (−1) [c ··· c ](c − a )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k,σ(k) k+1,σ(k+1) n,σ(n) σ k=1 ≥− [a ··· a ](b )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k+1,σ(k+1) n,σ(n) k=1 kl ≥− max [a ··· a ](a )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k+1,σ(k+1) n,σ(n) 1≤l≤n a kl k=1 σ kl = − max · per(a ). ij 1≤l≤n a kl k=1 In order to use Lemma 4.1 to estimate det(Σ), we set −δ −δ i j a = and b = e a ij ij ij δ + δ i j for a specific sequence {δ } defined by i=1 p+m δ =4 (m + q), 0 ≤ p<m, 1 ≤ q ≤ m mp+q for n = m . Clearly, we have −2m4 2 (8) 0 <b /a ≤ e , 1 ≤ k, l ≤ n = m . kl kl It remains to estimate det(a )and per(a ), which are given in the following lemma. ij ij Lemma 4.2. For the matrix (a ) defined above, we have per(a ) ≤ 1 and det(a ) ij ij ij −2m ≥ (240e) . Proof. It is easy to see that (m )! per(a ) ≤ n!(max a ) ≤ ≤ 1 ij ij m m i,j (2m4 ) m −1 2 since a ≤ (2m4 ) for 1 ≤ i, j ≤ n = m . ij HOW MANY LAPLACE TRANSFORMS ARE THERE? 4341 To estimate det(a ), we use Cauchy’s determinant identity; see Krattenthaler ij (1999): 2 2 (δ − δ ) 1 j i 1 δ − δ 1≤i<j≤n j i det(a )=det =  =  · . ij n δ + δ (δ + δ ) 2 δ δ + δ i j j i i j i 1≤i,j≤n i=1 1≤i<j≤n To estimate the last product, we partition the set {(i, j): 1 <i <j <n = m } into the following three sets and estimate each part separately. For 1 ≤ i< j ≤ n = m ,write i = mp + q and j = mr + s with 1 ≤ q, s ≤ m.Denote A = {(i, j): i = mp + q, j = mp + s, 0 ≤ p ≤ m − 1, 1 ≤ q< s ≤ m}, B = {(i, j): i = mp + q, j = m(p +1)+ s, 0 ≤ p ≤ m − 2, 1 ≤ q, s ≤ m}, C = {(i, j): i = mp + q, j = mr + s, 0 ≤ p ≤ m − 3,p +2 ≤ r ≤ m − 1, 1 ≤ q, s ≤ m}. Thus, A, B and C form a partition of {(i, j):1 ≤ i<j ≤ n = m }. First, for (i, j) ∈ A, δ − δ s − q s − q j i = > . δ + δ 2m + s + q 4m j i Thus 2 m−1 2 m−1 m−k 2m δ − δ s − q k j i ≥ = δ + δ 4m 4m j i p=0 1≤q<s≤m k=1 q=1 (i,j)∈A 2 2 m−1 2m 2m k (m − 1)! ≥ = m−1 4m (4m) k=1 −2m ≥ (8e) . Second, for (i, j) ∈ B, δ − δ (4m +4s) − (m + q) 1 j i = ≥ . δ + δ (4m +4s)+(m + q) 5 j i Thus we have m−2 δ − δ j i −2 −2m ≥ 5 ≥ 5 . δ + δ j i p=0 1≤q,s≤m (i,j)∈B Third, for (i, j) ∈ C,wehave r − p ≥ 2, and r p p δ − δ 4 (m + s) − 4 (m + q) 2 · 4 (m + q) 1 j i = =1 − > 1 − . r p r p r−p−1 δ + δ 4 (m + s)+4 (m + q) 4 (m + s)+ 4 (m + q) 4 j i 4342 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER Thus, since (1 − x ) ≥ 1 − x for 0 <x < 1, k k k 2 m−3 m−1 2 δ − δ 1 j i ≥ 1 − r−p−1 δ + δ 4 j i (i,j)∈C p=0 r=p+2 1≤q,s≤m m−2 2m −k ≥ 1 − 4 k=1 2m m−2 −k ≥ 1 − 4 k=1 2m ≥ (2/3) . Therefore, we have 2 2 δ − δ δ − δ j i j i −2m = · · ≥ (60e) . δ + δ δ + δ j i j i 1≤i<j≤n (i,j)∈A (i,j)∈B (i,j)∈C On the other hand, it is not difficult to see that n m m−1 2 2 2 3 2 n m p+m m m (m−1)/2+m m 2 δ =2 4 (m + q) < 2 · 4 (2m) i=1 q=1 p=0 3 2 2 3 3m /2+m /2+m log m 2m =4 < 4 for m> 1. Hence, −1 n 2 δ − δ 3 j i n −2m det(a )= 2 δ · ≥ (240e) . ij i δ + δ j i i=1 1≤i<j≤n Now combining the two lemmas above and using the estimate in (8), we obtain 3 m 3 −2m 2 −2m4 −16m det(Σ) ≥ (240e) − m · e ≥ e provided that m is large enough. Plugging into (7), we have 3 2 8m m P sup |X(t)| <ε ≤ e ε . t≥0 Minimizing the right-hand side by choosing m ≈| log ε|/12, we obtain −1 3 P sup |X(t)| <ε  exp −(432) | log ε| . t≥0 Statement (ii) of Theorem 1.1 follows by applying Corollary 3.2. At the same time, we also finish the proof of Theorem 1.2. HOW MANY LAPLACE TRANSFORMS ARE THERE? 4343 Acknowledgment We owe thanks to the referee for a number of helpful suggestions. References 1. Alzer, H. and Berg, C. (2002). Some classes of completely monotonic functions. Ann. Acad. Sci. Fenn. Math. 27 445–460. MR1922200 (2003e:26013) 2. Artstein, S., Milman, V., Szarek, S. and Tomczak-Jaegermann, N. (2004). On convexified packing and entropy duality. Geom.Funct.Anal. 14 1134–1141. MR2105957 (2005h:47038) 3. Aurzada, F., Ibragimov, I., Lifshits, M. and van Zanten J.H. (2009). Small deviations of smooth stationary Gaussian processes. Theory of Probability and Its Applications 53 697–707. 4. Ball, K. and Pajor, A. (1990). The entropy of convex bodies with “few” extreme points. In Geometry of Banach spaces (Strobl, 1989), vol. 158 of London Math. Soc. Lecture Note Ser., Cambridge Univ. Press, Cambridge, 25–32. MR1110183 (93b:46024) 5. Blei, R., Gao, F. and Li, W. V. (2007). Metric entropy of high dimensional distributions. Proc. Amer. Math. Soc. 135 4009–4018. MR2341952 (2008g:60010) 6. Bourgain, J., Pajor, A., Szarek,S.J. and Tomczak-Jaegermann, N. (1989). On the duality problem for entropy numbers of operators. In Geometric aspects of functional analysis (1987– 88), vol. 1376 of Lecture Notes in Math., Springer, Berlin, 50–63. MR1008716 (90k:47043) 7. Carl, B. (1997). Metric entropy of convex hulls in Hilbert spaces. Bull. London Math. Soc. 29 452–458. MR1446564 (98g:46023) 8. Carl, B., Kyrezi, I. and Pajor, A. (1999). Metric entropy of convex hulls in Banach spaces. J. London Math. Soc. (2) 60 871–896. MR1753820 (2001c:46019) 9. Dudley, R. M. (1987). Universal Donsker classes and metric entropy. Ann. Probab. 15 1306–1326. MR905333 (88g:60081) 10. Feller, W. (1971). An introduction to probability theory and its applications. Vol. II. Second edition, John Wiley & Sons Inc., New York. MR0270403 (42:5292) 11. Gao, F. (2004). Entropy of absolute convex hulls in Hilbert spaces. Bull. London Math. Soc. 36 460–468. MR2069008 (2005e:41071) 12. Gao, F. (2008). Entropy estimate for k-monotone functions via small ball probability of integrated Brownian motion. Electron. Commun. Probab. 13 121–130. MR2386068 (2008m:60063) 13. Gao, F. and Wellner, J. A. (2009). On the rate of convergence of the maximum likelihood estimator of a k-monotone density. Science in China, Series A: Mathematics 52 1525–1538. MR2520591 14. Jewell, N. P. (1982). Mixtures of exponential distributions. Ann. Statist. 10 479–484. MR653523 (83f:62057) 15. Krattenthaler, C. (1999). Advanced determinant calculus. S´ em. Lothar. Combin. 42 Art. B42q, 67 pp. (electronic). The Andrews Festschrift (Maratea, 1998). MR1701596 (2002i:05013) 16. Kuelbs, J. and Li, W. V. (1993). Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116 133–157. MR1237989 (94j:60078) 17. Li, W. V. and Linde, W. (1999). Approximation, metric entropy and small ball estimates for Gaussian measures. Ann. Probab. 27 1556–1578. MR1733160 (2001c:60059) 18. Li, W. V. and Linde, W. (2000). Metric entropy of convex hulls in Hilbert spaces. Studia Math. 139 29–45. MR1763043 (2001h:60063) 19. Tomczak-Jaegermann, N. (1987). Dualit´e des nombres d’entropie pour des op´ erateurs `avaleurs dans un espace de Hilbert. C. R. Acad. Sci. Paris S´ er. I Math. 305 299–301. MR910364 (89c:47027) 20. van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical pro- cesses. With applications to statistics. Springer Series in Statistics, Springer-Verlag, New York. MR1385671 (97g:60035) 21. Widder, D. V. (1941). The Laplace Transform. Princeton Mathematical Series, v. 6, Princeton University Press, Princeton, N. J. MR0005923 (3:232d) 22. Williamson, R. E. (1956). Multiply monotone functions and their Laplace transforms. Duke Math. J. 23 189–207. MR0077581 (17:1061d) 4344 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER Department of Mathematics, University of Idaho, Moscow, Idaho 83844 E-mail address: fuchang@uidaho.edu Department of Mathematical Sciences, University of Delaware, Newark, Delaware E-mail address: wli@math.udel.edu Department of Statistics, University of Washington, Seattle, Washington 98195 E-mail address: jaw@stat.washington.edu http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Proceedings of the American Mathematical Society Unpaywall

How many Laplace transforms of probability measures are there?

Proceedings of the American Mathematical SocietyMay 24, 2010

Loading next page...
 
/lp/unpaywall/how-many-laplace-transforms-of-probability-measures-are-there-ewCChVLJsE

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Unpaywall
ISSN
0002-9939
DOI
10.1090/s0002-9939-2010-10448-3
Publisher site
See Article on Publisher Site

Abstract

PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 138, Number 12, December 2010, Pages 4331–4344 S 0002-9939(2010)10448-3 Article electronically published on May 24, 2010 HOW MANY LAPLACE TRANSFORMS OF PROBABILITY MEASURES ARE THERE? FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER (Communicated by Richard C. Bradley) Abstract. A bracketing metric entropy bound for the class of Laplace trans- forms of probability measures on [0, ∞) is obtained through its connection with the small deviation probability of a smooth Gaussian process. Our results for the particular smooth Gaussian process seem to be of independent interest. 1. Introduction Let μ be a finite measure on [0, ∞). The Laplace transform of μ is a function on (0, ∞) defined by −ty (1) f(t)= e μ(dy). n (n) It is easy to check that such a function has the property that (−1) f (t) ≥ 0for all nonnegative integers n and all t> 0. A function on (0, ∞) with this property is called a completely monotone function on (0, ∞). A characterization due to Bernstein (cf. Williamson (1956)) says that f is completely monotone on (0, ∞)if and only if there is a nonnegative measure μ (not necessary finite) on [0, ∞)such that (1) holds. Therefore, due to monotonicity, the class of Laplace transforms of finite measures on [0, ∞) is the same as the class of bounded completely monotone functions on (0, ∞). These functions can be extended to continuous functions on [0, ∞), and we will call them completely monotone on [0, ∞). Completely monotonic functions have remarkable applications in various fields, such as probability and statistics, physics and potential theory. The main properties of these functions are given in Widder (1941), Chapter IV. For example, the class of completely monotonic functions is closed under sums, products and pointwise convergence. We refer to Alzer and Berg (2002) for a detailed list of references on completely monotonic functions. Closely related to the class of completely mono- tonic functions are the so-called k-monotone functions, where the nonnegativity of n (n) (−1) f is required for all integers n ≤ k. In fact, completely monotonic functions can be viewed as the limiting case of the k-monotone functions as k →∞.In this Received by the editors September 15, 2009 and, in revised form, February 2, 2010. 2010 Mathematics Subject Classification. Primary 46B50, 60G15, 60G52; Secondary 62G05. Key words and phrases. Laplace transform, bracketing metric entropy, completely monotone functions, smooth Gaussian process, small deviation probability. The second author was supported in part by NSF grant DMS-0805929. The third author was supported in part by NSF Grant DMS-0804587 and NIH/NIAID Grant 5 R37 A1029168. c 2010 American Mathematical Society Reverts to public domain 28 years from publication 4331 4332 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER sense, the present work is a partial extension of Gao (2008) and Gao and Wellner (2009). Let M be the class of completely monotone functions on [0, ∞)that are bounded by 1. Then −tx M = f :[0, ∞) → [0, ∞) f(t)= e μ(dx), μ≤ 1 . It is well known (see e.g. Feller (1971), Theorem 1, page 439) that the subclass of M with f(0) = 1 corresponds exactly to the Laplace transforms of the class of probability measures μ on [0, ∞). For a random variable with distribution function F (t)= P (X ≤ t), define the survival function S(t)= 1 − F (t)= P (X> t). Thus the class −tx S = S :[0, ∞) → [0, ∞) S(t)= e μ(dx), μ =1 is exactly the class of survival functions of all scale mixtures of the standard expo- −t nential distribution (with survival function e ), with corresponding densities −xt p(t)= −S (t)= xe μ(dx),t ≥ 0. It is easily seen that the class P of such densities with p(0) < ∞ is also a class of completely monotone functions corresponding to probability measures μ on [0, ∞) with finite first moment. These classes have many applications in statistics; see e.g. Jewell (1982) for a brief survey. Jewell (1982) considered nonparametric estimation of a completely monotone density and showed that the nonparametric maximum likelihood estimator (or MLE) for this class is almost surely consistent. The brack- eting entropy bounds derived below can be considered as a first step toward global rates of convergence of the MLE. In probability and statistical applications, one way to understand the complexity of a function class is by way of the metric entropy for the class under certain common distances. Recall that the metric entropy of a function class F under distance ρ is defined to be log N(ε, F,ρ)where N(ε, F,ρ) is the minimum number of open balls of radius ε needed to cover F. In statistical applications, sometimes bracketing metric entropy is needed. Recall that bracket entropy is defined as log N (ε, F,ρ), [] where N (ε, F,ρ):=min n : ∃f , f ,...,f , f s.t. ρ(f ,f ) ≤ ε,F⊂ [f , f ] [] 1 n k k 1 n k k k=1 and [f , f ]= g ∈F : f ≤ g ≤ f . k k k k Clearly N(ε, F,ρ) ≤ N (ε, F,ρ), and they are closely related in our setting below. [] In this paper, we study the metric entropy of M under the L (ν)-norm given by f = |f(x)| ν(dx), 1 ≤ p ≤∞, L (ν) where ν is a probability measure on [0, ∞). Our main result is the following. HOW MANY LAPLACE TRANSFORMS ARE THERE? 4333 Theorem 1.1. (i) Let ν be a probability measure on [0, ∞). There exists a constant C depending only on p ≥ 1 such that for any 0 <ε< 1/4, log N (ε, M ,· p ) ≤ C log(Γ/γ) ·| log ε| , [] L (ν) −p p for any 0 <γ < Γ < ∞ such that ν([γ, Γ]) ≥ 1 − 4 ε . In particular, if there K −K −p p exists a constant K> 1, such that ν([ε ,ε ]) ≥ 1 − 4 ε ,then log N (ε, M ,·  p ) ≤ CK| log ε| . [] ∞ L (ν) (ii) If ν is Lebesgue measure on [0, 1],then log N (ε, M ,· 2 ) log N(ε, M ,·  2 ) | log ε| , [] ∞ L (ν) ∞ L (ν) where A B means that there exist universal constants C ,C > 0 such that 1 2 C A ≤ B ≤ C B. 1 2 As an equivalent result for part (ii) of the above theorem, we have the following important small deviation probability estimates for an associated smooth Gaussian process. In particular, it may be of interest to find a probabilistic proof for the lower bound directly. Theorem 1.2. Let Y (t), t> 0, be a Gaussian process with covariance E Y (t)Y (s)= −t−s (1 − e )/(t + s). Then for 0 <ε< 1, log P sup |Y (t)| <ε −| log ε| . t>0 The rest of the paper is organized as follows. In Section 2, we provide the upper bound estimate in the main result by explicit construction. In Section 3, we summarize various connections between entropy numbers of a set (and its convex hull) and small ball probabilities for the associated Gaussian process. Some of our observations in a general setting are stated explicitly for the first time. Finally we identify the particular Gaussian process suitable for our entropy estimates. Then in Section 4, we obtain the required upper bound small ball probability estimate (which implies the lower bound entropy estimate as discussed in section 3) by a simple determinant estimate. This method of small ball estimates is made explicit here for the first time and can be used in many more problems. The technical determinant estimates are also of independent interest. 2. Upper bound estimate In this section, we provide an upper bound for N (ε, M ,· ), where ν [] ∞ L (ν) is a probability measure on [0, ∞)and 1 ≤ p ≤∞. Beforewestart, wenotethat M is the convex hull of K := {K(t, ·): t ∈ [0, ∞)},where for each t ∈ [0, ∞), −tx K(t, ·)isafunction on [0, ∞) defined by K(t, x)= e . There are some general results on metric entropy of convex hulls conv(T ) using the metric entropy of T . (Cf. Dudley (1987), Ball and Pajor (1990), van der Vaart and Wellner (1996), Carl (1997), Carl et al. (1999), Li and Linde (2000), Gao (2004), etc.) For example, −α Carl et al. (1999) proved that if N(ε, T,·)= O(ε ), α> 0, then −2α/(2+α) log N(ε, conv(T ),· )= O(ε ), where · is any Banach space norm. Although these results are best possible for the general case, when applied to specific problems they could be far from being sharp. This is especially the case when the metric entropy of T grows at a polynomial rate. −kx For example, in our case, because the functions e , k =1, 2,...,n, have mutual 4334 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER 2 1 −3/2 −2/3 L [0, 1]-distance at least n , we immediately have N(ε, K, · ) ≥ Cε . L [0,1] Thus, in the case p = 2 and with ν taken to be Lebesgue measure on [0, 1], the best upper bound we can hope to obtain using the general convex hull result quoted above is −1/2 log N(ε, M ,· 2 ) ≤ C ε , ∞ L (ν) 2 which is much larger (at least in the dependence on ) than the upper bound log N(ε, M ,· 2 ) ≤ C| log ε| , L (ν) which we will obtain later in the section. We will obtain our upper bound estimate by an explicit construction of ε-brackets under L (ν)-distance. For each 0 <ε< 1/4, we choose γ> 0and Γ=2 γ,where m is a positive −p p integer such that ν([γ, Γ]) ≥ 1 − 4 ε . Weuse thenotation I(a ≤ t<b)todenote the indicator function of the interval [a, b). Now for each f ∈M , we first write in block form i−1 i f(t)= I(0 ≤ t<γ)f(t)+ I(t ≥ Γ)f(t)+ I(2 γ ≤ t< 2 γ)f(t). i=1 i−1 i Then for each block 2 γ ≤ t< 2 γ, we separate the integration limits at the level 2−i −u 2 | log ε|/γ and use the first N terms of Taylor’s series expansion of e with error terms associated with ξ = ξ ,0 ≤ ξ ≤ 1, to rewrite u,N f(t)= I(0 ≤ t<γ)f(t)+ I(t ≥ Γ)f(t)+ (p (t)+ q (t)+ r (t)), i i i i=1 where 2−i n n 2 | log ε|/γ (−1) t i−1 i n p (t)=: I(2 γ ≤ t< 2 γ) x μ(dx), n! n=0 2−i 2 | log ε|/γ N+1 (−ξtx) i−1 i q (t)=: I(2 γ ≤ t< 2 γ) μ(dx), (N +1)! i−1 i −tx r (t)=: I(2 γ ≤ t< 2 γ) e μ(dx). 2−i 2 | log ε|/γ We choose the integer N so that 2 2 (2) 4e | log ε|− 1 ≤ N< 4e | log ε|. Then, by using the inequality k! ≥ (k/e) and the fact that 0 <ξ < 1, we have i−1 i within the block 2 γ ≤ t< 2 γ, 2−i 2 | log ε|/γ N+1 (tx) |q (t)|≤ μ(dx) (N +1)! N+1 N+1 |4log ε| 4e| log ε| −(N+1) 4e ≤ ≤ ≤ e ≤ ε , (N +1)! N +1 i 2−i whereweused tx ≤ 2 γ · 2 | log ε|/γ =4| log ε| in the second inequality above. This implies, due to disjoint supports of q (t), 4e (3)  q (t) ≤ ε . i=1 HOW MANY LAPLACE TRANSFORMS ARE THERE? 4335 i−1 2−i −1 −tx 2 Next, we notice that for t ≥ 2 γ and x ≥ 2 γ | log ε|, e ≤ ε .Thus m m i−1 i 2 2 (4) r (t) ≤ I(2 γ ≤ t< 2 γ) ε μ(dx) ≤ ε . 2−i −1 2 γ | log ε| i=1 i=1 −p p Finally, because |f|≤ 1and ν([0,γ)) + ν([Γ, ∞)) ≤ 4 ε ,wehave I f(t)+ I f(t) ≤ ε/4. 0≤t<γ t≥Γ p L (ν) Together with (3) and (4), we see that the set m m R =: q (t)+ r (t)+ I(t< γ)f(t)+ I(t ≥ Γ)f(t): f ∈M i i ∞ i=1 i=1 p 2 4e has diameter in L (ν)-distance at most ε + ε + ε/4 <ε/2. Therefore, if we denote P = {p (t): f ∈M }, then the expansion of f above i i ∞ implies that M ⊂ P + R, and consequently, we have ∞ i i=1 p p N (ε, M ,· ) ≤ N ε/2, P ,· . [] ∞ L (ν) [] i L (ν) i=1 For any 1 ≤ i ≤ m and any p ∈P , we can write i i i−1 i n −i −1 n (5) p (t)= I(2 γ ≤ t< 2 γ) (−1) a (2 γ t) , i ni n=0 where 0 ≤ a ≤|4log ε| /n!. Now we can construct ni i−1 i n −i −1 n p = I(2 γ ≤ t< 2 γ) (−1) b (2 γ t) , ni n=0 i−1 i n −i −1 n p = I(2 γ ≤ t< 2 γ) (−1) c (2 γ t) , ni n=0 where ⎧ ⎧ n+2 n+2 ε 2 a ε 2 a ni ni ⎪ ⎪ if n is even,  if n is even, ⎨ ⎨ n+2 n+2 2 ε 2 ε b = c = ni ni ⎪ n+2 ⎪ n+2 ⎩ ε 2 a ⎩ ε 2 a ni ni if n is odd, if n is odd. n+2 n+2 2 ε 2 ε Clearly, p (t) ≤ p (t) ≤ p (t), and i i i−1 i −i −1 n |p − p|≤ I(2 γ ≤ t< 2 γ) |c − b |(2 γ t) ni ni n=0 i−1 i −i −1 n ≤ I(2 γ ≤ t< 2 γ) (2 γ t) n+2 n=0 i−1 i ≤ I(2 γ ≤ t< 2 γ). Hence m m m m p ≤ p ≤ p ≤ p + ε/2. i i i=1 i=1 i=1 i=1 4336 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER That is, the sets m m P =: p : p ∈P , 1 ≤ i ≤ m and P =: p : p ∈P , 1 ≤ i ≤ m i i i i i=1 i=1 ∞ p form ε/2bracketsof P in the L -norm, and thus in the L (ν)-norm for all i=1 1 ≤ p< ∞. Now we count the number of different realizations of P and P. Note that, due to the uniform bound on a in (5), there are no more than ni n+1 n 2 |4log ε| · +1 ε n! realizations for b . So, the number of realizations of p is bounded by ni n+1 n 2 |4log ε| · +1 . ε n! n=0 Because n! > (n/e) , for all 1 ≤ n ≤ N,wehave n+1 n 2 |4log ε| 3 8e| log ε| · +1 ≤ . ε n! ε n Thus, the number of realizations of p is bounded by N+1 · exp (n log |8e log ε|− n log n) n=1 N+1 3 N(N +1) ≤ · exp log |8e log ε|− x log xdx ε 2 N+1 2 2 3 N(N +1) N N ≤ · exp log |8e log ε|− log N + ε 2 2 4 ≤ exp C| log ε| for some absolute constant C, where in the last inequality we used the bounds on N given in (2). Hence the total number of realizations of P is bounded by exp Cm| log ε| . A similar estimate holds for the total number of realizations of P, and we finally obtain log N (ε, M ,· p ) ≤ C m| log ε| [] ∞ L (ν) for some different constant C . This finishes the proof since m =log (Γ/γ). 3. Entropy of convex hulls A lower bound estimate of metric entropy is typically difficult, because it often involves a construction of a well-separated set of maximal cardinality. Thus we introduce some soft analytic arguments to avoid this difficulty and change the problem into a familiar one in this section. The hard estimates are given in the next section. First note that M is just the convex hull of the functions k (·), 0 <s< ∞, ∞ s −ts where k (t)= e . We recall a general method to bound the entropy of convex s HOW MANY LAPLACE TRANSFORMS ARE THERE? 4337 hulls that was introduced in Gao (2004). Let T be a set in R or in a Hilbert space. The convex hull of T can be expressed as ∞ ∞ conv(T)= a t : t ∈ T, a ≥ 0,n ∈ N, a =1 , n n n n n n=1 n=1 while the absolute convex hull of T is defined by ∞ ∞ abconv(T)= a t : t ∈ T, n ∈ N, |a |≤ 1 . n n n n n=1 n=1 Clearly, by using probability measures and signed measures, we can express conv(T)= tμ(dt): μ is a probability measure on T ; abconv(T)= tμ(dt): μ is a signed measure on T, μ ≤ 1 . TV The following is clear: conv(T ) ⊂ abconv(T ) ⊂ conv(T ) − conv(T ). Therefore, for any norm ·, N(ε, conv(T ),· ) ≤ N(ε, abconv(T ),· ) ≤ [N(ε/2, conv(T ),· )] . In particular, at the logarithmic level, the two entropy numbers are comparable, modulo constant factors on ε. The benefit of using the absolute convex hull is that it is symmetric and can be viewed as the unit ball of a Banach space, which allows us to use the following duality lemma of metric entropy: there exist constants c , c , K , K > 0 such that for all ε> 0, 2 1 2 K log N(c ε, abconv(T ),·  ) ≤ log N(ε, B,· ) 1 1 2 T ≤ K log N(c ε, abconv(T ),·  ), 2 2 2 where B is the unit ball of the dual norm of ·,and ·  is the norm induced by T,thatis, x := sup |t, x|=sup |t, x|. t∈T t∈abconv(T ) Strictly speaking, the duality lemma remains a conjecture in the general case. However, when the norm ·  is a Hilbert space norm, this has been proved; see Tomczak-Jaegermann (1987), Bourgain et al. (1989), and Artstein et al. (2004). A striking relation discovered by Kuelbs and Li (1993) says that the entropy number log N(ε, B,· ) is determined by the Gaussian measure of the set D =: {x ∈ H : x ≤ ε} ε T under some very weak regularity assumptions. For details, see Kuelbs and Li (1993), Li and Linde (1999), and also Corollary 2.2 of Aurzada et al. (2009). Using this relation, we can now summarize the connection between the metric entropy of convex hulls and the Gaussian measure of D as follows: Proposition 3.1. Let T be a precompact set in a Hilbert space. For α> 0 and β ∈ R, there exists a constant C > 0 such that for all 0 <ε< 1, −α β log P (D ) ≤−C ε | log ε| ε 1 4338 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER if and only if there exists a constant C > 0 such that for all 0 <ε< 1, 2α 2β 2+α 2+α log N(ε, conv(T ),· ) ≥ C ε | log ε| ; and for β> 0 and γ ∈ R, there exists a constant C > 0 such that for all 0 <ε< 1, β γ log P (D ) ≤−C | log ε| (log | log ε|) ε 3 if and only if there exists a constant C > 0 such that for all 0 <ε< 1, β γ log N(ε, conv(T ),·  ) ≥ C | log ε| (log | log ε|) . 2 4 Furthermore, the results also hold if the directions of the inequalities are switched. The result of this proposition can be implicitly seen in Gao (2004), where an explanation of the relation between N(ε, B, · ) and the Gaussian measure of D T ε is also given. Perhaps the most useful case of Proposition 3.1 is when T is a set of functions: K(t, ·), t ∈ T,where foreachfixed t ∈ T , K(t, ·) is a function in L (Ω), and where Ω is a bounded set in R , d ≥ 1. For this special case, we have Corollary 3.2. Let X(t)= K(t, x)dB(x), t ∈ T,where K(t, ·) are square- integrable functions on a bounded set Ω in R , d ≥ 1,and B(x) is the d-dimensional Brownian sheet on Ω.If F is the convex hull of the functions K(t, ·), t ∈ T,then −α β log P sup |X(t)| <ε −ε | log ε| t∈T for α> 0 and β ∈ R if and only if 2α 2β 2+α 2+α log N(ε, F,· ) ε | log ε| ; and for β> 0 and γ ∈ R, β γ log P sup |X(t)| <ε −| log ε| (log | log ε|) t∈T if and only if β γ log N(ε, F,·  ) | log ε| (log | log ε|) . The authors found this corollary especially useful. For example, it was used in Blei et al. (2007) and Gao (2008) to change a problem of metric entropy into a problem of small deviation probability of a Gaussian process, which is relatively easier. The proof is given in Gao (2008) for the case Ω = [0, 1], and in Blei et al. (2007) for the case [0, 1] . For the general case, it can be proved as easily. Indeed, the only thing we need to prove is that P(D ) can be expressed as the probability of the set sup |X(t)| <ε. We outline a proof below. Let φ be an orthonormal t∈T basis of L (Ω). Then X(t)= K(t, s)dB(s)= ξ K(t, s)φ (s)ds, n n Ω Ω n=1 HOW MANY LAPLACE TRANSFORMS ARE THERE? 4339 where ξ are i.i.d standard normal random variables. Thus, P(D )= P g ∈ L (Ω) : f(s)g(s)ds <ε,f ∈F = P g ∈ L (Ω) : K(t, s)g(s)dsμ(dt) <ε, μ ≤ 1 TV T Ω ∞ ∞ ∞ = P a φ (s): a < ∞, a K(t, s)φ (s)dsμ(dt) <ε, n n n n T Ω n=1 n=1 n=1 μ ≤ 1 TV ∞ ∞ ∞ = P a φ (s): a < ∞, sup a K(t, s)φ (s)ds <ε n n n n t∈T n=1 n=1 n=1 = P sup |X(t)| <ε . t∈T Now returning to our problem to estimate log N(ε, M ,· ) in the statement ∞ 2 (ii) of Theorem 1.1, where ·  is the L -norm with respect to Lebesgue measure on [0, 1], we notice that M is the convex hull of the functions K(t, ·), t ∈ [0, ∞), −ts on [0, 1], with K(t, s)= e . Clearly, for each fixed t, K(t, ·) is a square-integrable function on the bounded set [0, 1]. Now, for this K, the corresponding X(t)is a Gaussian process on [0, ∞) with covariance −t−s 1 − e (6) E X(t)X(s)= ,s,t ≥ 0. t + s Thus, the problem becomes how to sharply bound the probability P sup |X(t)| <ε . t∈(0,1] This will be done in the next section. 4. Lower bound estimate Let X(t), t ≥ 0 be the centered Gaussian process defined in (6). Our goal in this section is to prove that log P(sup |X(t)| <ε) ≤−C| log ε| , t≥0 for some constant C> 0. Note that for any sequence of positive numbers {δ } , i=1 P sup |X(t)| <ε ≤ P(max |X(δ )| <ε) 1≤i≤n t≥0 −n/2 −1/2 −1 =(2π) (det Σ) exp −y, Σ y dy ··· dy 1 n (7) max |y |≤ε 1≤i≤n i −n/2 −1/2 n ≤ (2π) (det Σ) (2ε) n −1/2 ≤ ε (det Σ) , 4340 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER wherewe use thecovariancematrix −δ −δ i j 1 − e Σ=(EX(δ )X(δ )) = . i j 1≤i,j≤n δ + δ i j 1≤i,j≤n To find a lower bound for det(Σ), we need the following lemma: Lemma 4.1. If 0 <b <a for all 1 ≤ i, j ≤ n,then ij ij kl det(a − b ) ≥ det(a ) − max · per(a ), ij ij ij ij 1≤l≤n a kl k=1 where per(a ) is the permanent of the matrix (a ). ij ij Proof. For notational simplicity, we denote c = a − b .Then ij ij ij det(a − b ) − det(a ) ij ij ij σ σ (−1) c c c − (−1) a a ··· a 1,σ(1) 2,σ(2) n,σ(n) 1,σ(1) 2,σ(2) n,σ(n) σ σ = (−1) [c ··· c ](c − a )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k,σ(k) k+1,σ(k+1) n,σ(n) σ k=1 ≥− [a ··· a ](b )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k+1,σ(k+1) n,σ(n) k=1 kl ≥− max [a ··· a ](a )[a ··· a ] 1,σ(1) k−1,σ(k−1) k,σ(k) k+1,σ(k+1) n,σ(n) 1≤l≤n a kl k=1 σ kl = − max · per(a ). ij 1≤l≤n a kl k=1 In order to use Lemma 4.1 to estimate det(Σ), we set −δ −δ i j a = and b = e a ij ij ij δ + δ i j for a specific sequence {δ } defined by i=1 p+m δ =4 (m + q), 0 ≤ p<m, 1 ≤ q ≤ m mp+q for n = m . Clearly, we have −2m4 2 (8) 0 <b /a ≤ e , 1 ≤ k, l ≤ n = m . kl kl It remains to estimate det(a )and per(a ), which are given in the following lemma. ij ij Lemma 4.2. For the matrix (a ) defined above, we have per(a ) ≤ 1 and det(a ) ij ij ij −2m ≥ (240e) . Proof. It is easy to see that (m )! per(a ) ≤ n!(max a ) ≤ ≤ 1 ij ij m m i,j (2m4 ) m −1 2 since a ≤ (2m4 ) for 1 ≤ i, j ≤ n = m . ij HOW MANY LAPLACE TRANSFORMS ARE THERE? 4341 To estimate det(a ), we use Cauchy’s determinant identity; see Krattenthaler ij (1999): 2 2 (δ − δ ) 1 j i 1 δ − δ 1≤i<j≤n j i det(a )=det =  =  · . ij n δ + δ (δ + δ ) 2 δ δ + δ i j j i i j i 1≤i,j≤n i=1 1≤i<j≤n To estimate the last product, we partition the set {(i, j): 1 <i <j <n = m } into the following three sets and estimate each part separately. For 1 ≤ i< j ≤ n = m ,write i = mp + q and j = mr + s with 1 ≤ q, s ≤ m.Denote A = {(i, j): i = mp + q, j = mp + s, 0 ≤ p ≤ m − 1, 1 ≤ q< s ≤ m}, B = {(i, j): i = mp + q, j = m(p +1)+ s, 0 ≤ p ≤ m − 2, 1 ≤ q, s ≤ m}, C = {(i, j): i = mp + q, j = mr + s, 0 ≤ p ≤ m − 3,p +2 ≤ r ≤ m − 1, 1 ≤ q, s ≤ m}. Thus, A, B and C form a partition of {(i, j):1 ≤ i<j ≤ n = m }. First, for (i, j) ∈ A, δ − δ s − q s − q j i = > . δ + δ 2m + s + q 4m j i Thus 2 m−1 2 m−1 m−k 2m δ − δ s − q k j i ≥ = δ + δ 4m 4m j i p=0 1≤q<s≤m k=1 q=1 (i,j)∈A 2 2 m−1 2m 2m k (m − 1)! ≥ = m−1 4m (4m) k=1 −2m ≥ (8e) . Second, for (i, j) ∈ B, δ − δ (4m +4s) − (m + q) 1 j i = ≥ . δ + δ (4m +4s)+(m + q) 5 j i Thus we have m−2 δ − δ j i −2 −2m ≥ 5 ≥ 5 . δ + δ j i p=0 1≤q,s≤m (i,j)∈B Third, for (i, j) ∈ C,wehave r − p ≥ 2, and r p p δ − δ 4 (m + s) − 4 (m + q) 2 · 4 (m + q) 1 j i = =1 − > 1 − . r p r p r−p−1 δ + δ 4 (m + s)+4 (m + q) 4 (m + s)+ 4 (m + q) 4 j i 4342 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER Thus, since (1 − x ) ≥ 1 − x for 0 <x < 1, k k k 2 m−3 m−1 2 δ − δ 1 j i ≥ 1 − r−p−1 δ + δ 4 j i (i,j)∈C p=0 r=p+2 1≤q,s≤m m−2 2m −k ≥ 1 − 4 k=1 2m m−2 −k ≥ 1 − 4 k=1 2m ≥ (2/3) . Therefore, we have 2 2 δ − δ δ − δ j i j i −2m = · · ≥ (60e) . δ + δ δ + δ j i j i 1≤i<j≤n (i,j)∈A (i,j)∈B (i,j)∈C On the other hand, it is not difficult to see that n m m−1 2 2 2 3 2 n m p+m m m (m−1)/2+m m 2 δ =2 4 (m + q) < 2 · 4 (2m) i=1 q=1 p=0 3 2 2 3 3m /2+m /2+m log m 2m =4 < 4 for m> 1. Hence, −1 n 2 δ − δ 3 j i n −2m det(a )= 2 δ · ≥ (240e) . ij i δ + δ j i i=1 1≤i<j≤n Now combining the two lemmas above and using the estimate in (8), we obtain 3 m 3 −2m 2 −2m4 −16m det(Σ) ≥ (240e) − m · e ≥ e provided that m is large enough. Plugging into (7), we have 3 2 8m m P sup |X(t)| <ε ≤ e ε . t≥0 Minimizing the right-hand side by choosing m ≈| log ε|/12, we obtain −1 3 P sup |X(t)| <ε  exp −(432) | log ε| . t≥0 Statement (ii) of Theorem 1.1 follows by applying Corollary 3.2. At the same time, we also finish the proof of Theorem 1.2. HOW MANY LAPLACE TRANSFORMS ARE THERE? 4343 Acknowledgment We owe thanks to the referee for a number of helpful suggestions. References 1. Alzer, H. and Berg, C. (2002). Some classes of completely monotonic functions. Ann. Acad. Sci. Fenn. Math. 27 445–460. MR1922200 (2003e:26013) 2. Artstein, S., Milman, V., Szarek, S. and Tomczak-Jaegermann, N. (2004). On convexified packing and entropy duality. Geom.Funct.Anal. 14 1134–1141. MR2105957 (2005h:47038) 3. Aurzada, F., Ibragimov, I., Lifshits, M. and van Zanten J.H. (2009). Small deviations of smooth stationary Gaussian processes. Theory of Probability and Its Applications 53 697–707. 4. Ball, K. and Pajor, A. (1990). The entropy of convex bodies with “few” extreme points. In Geometry of Banach spaces (Strobl, 1989), vol. 158 of London Math. Soc. Lecture Note Ser., Cambridge Univ. Press, Cambridge, 25–32. MR1110183 (93b:46024) 5. Blei, R., Gao, F. and Li, W. V. (2007). Metric entropy of high dimensional distributions. Proc. Amer. Math. Soc. 135 4009–4018. MR2341952 (2008g:60010) 6. Bourgain, J., Pajor, A., Szarek,S.J. and Tomczak-Jaegermann, N. (1989). On the duality problem for entropy numbers of operators. In Geometric aspects of functional analysis (1987– 88), vol. 1376 of Lecture Notes in Math., Springer, Berlin, 50–63. MR1008716 (90k:47043) 7. Carl, B. (1997). Metric entropy of convex hulls in Hilbert spaces. Bull. London Math. Soc. 29 452–458. MR1446564 (98g:46023) 8. Carl, B., Kyrezi, I. and Pajor, A. (1999). Metric entropy of convex hulls in Banach spaces. J. London Math. Soc. (2) 60 871–896. MR1753820 (2001c:46019) 9. Dudley, R. M. (1987). Universal Donsker classes and metric entropy. Ann. Probab. 15 1306–1326. MR905333 (88g:60081) 10. Feller, W. (1971). An introduction to probability theory and its applications. Vol. II. Second edition, John Wiley & Sons Inc., New York. MR0270403 (42:5292) 11. Gao, F. (2004). Entropy of absolute convex hulls in Hilbert spaces. Bull. London Math. Soc. 36 460–468. MR2069008 (2005e:41071) 12. Gao, F. (2008). Entropy estimate for k-monotone functions via small ball probability of integrated Brownian motion. Electron. Commun. Probab. 13 121–130. MR2386068 (2008m:60063) 13. Gao, F. and Wellner, J. A. (2009). On the rate of convergence of the maximum likelihood estimator of a k-monotone density. Science in China, Series A: Mathematics 52 1525–1538. MR2520591 14. Jewell, N. P. (1982). Mixtures of exponential distributions. Ann. Statist. 10 479–484. MR653523 (83f:62057) 15. Krattenthaler, C. (1999). Advanced determinant calculus. S´ em. Lothar. Combin. 42 Art. B42q, 67 pp. (electronic). The Andrews Festschrift (Maratea, 1998). MR1701596 (2002i:05013) 16. Kuelbs, J. and Li, W. V. (1993). Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116 133–157. MR1237989 (94j:60078) 17. Li, W. V. and Linde, W. (1999). Approximation, metric entropy and small ball estimates for Gaussian measures. Ann. Probab. 27 1556–1578. MR1733160 (2001c:60059) 18. Li, W. V. and Linde, W. (2000). Metric entropy of convex hulls in Hilbert spaces. Studia Math. 139 29–45. MR1763043 (2001h:60063) 19. Tomczak-Jaegermann, N. (1987). Dualit´e des nombres d’entropie pour des op´ erateurs `avaleurs dans un espace de Hilbert. C. R. Acad. Sci. Paris S´ er. I Math. 305 299–301. MR910364 (89c:47027) 20. van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical pro- cesses. With applications to statistics. Springer Series in Statistics, Springer-Verlag, New York. MR1385671 (97g:60035) 21. Widder, D. V. (1941). The Laplace Transform. Princeton Mathematical Series, v. 6, Princeton University Press, Princeton, N. J. MR0005923 (3:232d) 22. Williamson, R. E. (1956). Multiply monotone functions and their Laplace transforms. Duke Math. J. 23 189–207. MR0077581 (17:1061d) 4344 FUCHANG GAO, WENBO V. LI, AND JON A. WELLNER Department of Mathematics, University of Idaho, Moscow, Idaho 83844 E-mail address: fuchang@uidaho.edu Department of Mathematical Sciences, University of Delaware, Newark, Delaware E-mail address: wli@math.udel.edu Department of Statistics, University of Washington, Seattle, Washington 98195 E-mail address: jaw@stat.washington.edu

Journal

Proceedings of the American Mathematical SocietyUnpaywall

Published: May 24, 2010

There are no references for this article.