# Global rates of convergence of the MLE for multivariate interval censoring

Global rates of convergence of the MLE for multivariate interval censoring We establish global rates of convergence of the Maximum Likelihood Estimator (MLE) of a multivariate distribution function on ℝ in the case of (one type of) “interval censored” data. The main finding is that the rate of convergence of the MLE in the Hellinger metric is no worse than −1/3 γ n (log n) for γ = (5d − 4)/6. Keywords and phrases Empirical processes; global rate; Hellinger metric; interval censoring; multivariate; multivariate monotone functions 1. Introduction and overview Our main goal in this paper is to study global rates of convergence of the Maximum Likelihood Estimator (MLE) in one simple model for multivariate interval-censored data. In section 3 we will show that under some reasonable conditions the MLE converges in a d −1/3 γ Hellinger metric to the true distribution function on ℝ at a rate no worse than n (log n) for γ = (5d − 4)/6 for all d ≥ 2. Thus the rate of convergence is only worse than the known −1/3 rate of n for the case d = 1 by a factor involving a power of log n growing linearly with the dimension. These new rate results rely heavily on recent bracketing entropy bounds for d–dimensional distribution functions obtained by Gao (2012). We begin in Section 2 with a review of interval censoring problems and known results in the case d = 1. We introduce the multivariate interval censoring model of interest here in Section 3, and obtain a rate of convergence for this model for d ≥ 2 in Theorem 3.1. Most of the proofs are given in Section 4, with the exception being a key corollary of Gao (2012), the statement and proof of which are given in the Appendix (Section 6). Finally, in Section 5 we introduce several related models and further problems. 2. Interval censoring (or current status data) on ℝ + + Let Y ~ F on ℝ , and let T ~ G on ℝ be independent of Y. Suppose that we observe X , 0 0 1 …, X i.i.d. as X = (Δ, T) where Δ = 1 . Here Y is often the time until some event of n [Y ≤T] Supported in part by a grant from the Simons Foundation (#246211). Supported in part by NSF Grant DMS-1104832 and NI-AID grant 2R01 AI291968-04. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 2 interest and T is an observation time. The goal is to estimate F nonparametrically based on observation of the X ’s. To calculate the likelihood, we first calculate the distribution of X for a general distribution function F: note that the conditional distribution of Δ conditional on T is Bernoulli: where p(T) = F(T). If G has density g with respect to some measure μ on ℝ , then X = (Δ, 0 0 T) has density with respect to the dominating measure (counting measure on {0, 1}) × μ. The nonparametric Maximum Likelihood Estimator (MLE) F of F in this interval n 0 censoring model was first obtained by Ayer et al. (1955). It is simply described as follows: let T ≤ ⋯ ≤ T denote the order statistics corresponding to T , …, T and let Δ , …, (1) (n) 1 n (1) Δ denote the corresponding Δ’s. Then the part of the log-likelihood of X , …, X (n) 1 n depending on F is given by (2.1) where (2.2) It turns out that the maximizer F of (2.1) subject to (2.2) can be described as follows: let H* be the (greatest) convex minorant of the points {(i, ∑ Δ ) : i ∈ {1, …, n}}: j≤i (j) ̂ ̂ ̂ Let F denote the left-derivative of H* at T . Then (F , …, F ) is the unique vector i (i) 1 n maximizing (2.1) subject to (2.2), and we therefore take the MLE F of F to be with the conventions T ≡ 0 and T ≡ ∞. See Ayer et al. (1955) or Groeneboom and (0) (n+1) Wellner (1992), pages 38–43, for details. Groeneboom (1987) initiated the study of F and proved the following limiting distribution result at a fixed point t . Theorem 2.1 (Groeneboom, 1987). Consider the current status model on ℝ . Suppose that 0 < F (t ), G (t ) < 1 and suppose that F and G are differentiable at t with strictly positive 0 0 0 0 0 derivatives f (t ) and g (t ) respectively. Then 0 0 0 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 3 where and where W is a standard two-sided Brownian motion starting from 0. The distribution of ℤ has been studied in detail by Groeneboom (1989) and computed by Groeneboom and Wellner (2001). Balabdaoui and Wellner (2012) show that the density f of ℤ is log-concave. van de Geer (1993) (see also van de Geer (2000)) obtained the following global rate result for p . Recall that the Hellinger distance h(p, q) between two densities with respect to a dominating measure μ is given by −1/3 Proposition 2.2 (van de Geer, 1993). h(p , p ) = O (n ). F F p n 0 Now for any distribution functions F and F the (squared) Hellinger distance h (p , p ) for 0 F F the current status model is given by (2.3) and hence Proposition 2.2 yields (2.4) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 4 ̂ −1/3 or ‖F − F ‖ = O (n ). n 0 L (G ) p 2 0 For generalizations of these and other asymptotic results for the current status model to more complicated interval censoring schemes for real-valued random variables Y, see e.g. Groeneboom and Wellner (1992), van de Geer (1993), Groeneboom (1996), van de Geer (2000), Schick and Yu (2000), and Groeneboom, Maathuis and Wellner (2008a,b). Our main focus in this paper, however, concerns one simple generalization of the interval censoring model for ℝ introduced above to interval censoring in ℝ . We now turn to this generalization. 3. Multivariate interval censoring: multivariate current status data +d d +d Let Y̲ = (Y , …, Y ) ~ F on ℝ ≡ [0, ∞) , and let T̲ = (T , …, T ) ~ G on ℝ be 1 d 0 1 d 0 independent of Y̲. We assume that G has density g with respect to some dominating 0 0 measure μ on ℝ . Suppose we observe X̲ , …, X̲ i.i.d. as X̲ = (Δ̲, T̲) where Δ̲ = (Δ , …, 1 n 1 Δ ) is given by Δ = 1 , j = 1, …, d. Equivalently, with a slight abuse of notation, X̲ = d j [Y ≤T ] j j (Γ̲, T̲) where Γ̲ = (Γ , …, Γ ) is a vector of length 2 consisting of 0’s and 1’s and with at 1 2 d +d most one 1 which indicates which of the 2 orthants of ℝ determined by T̲ the random vector Y̲ belongs. More explicitly, define . Then set Γ ≡ 1{k = K} d d for k = 1, …, 2 , so that Γ = 1 and Γ = 0 for l ∈ {1, …, 2 } \ {K}. Much as for univariate K l current status data, Y̲ represents a vector of times to events, T̲ is a vector of observation times, and the goal is nonparametric estimation of the joint distribution function F of Y̲ based on observation of the X̲ ’s. See Dunson and Dinse (2002), Jewell (2007), Wang (2009), and Lin and Wang (2011) for examples of settings in which data of this type arises. To calculate the likelihood, we first calculate the distribution of X̲ for a general distribution function F: note that the conditional distribution of Γ̲; conditional on T̲ is Multinomial: d +d where p̲(T̲; F) = (p (T̲; F), …, p (T̲; F)) and the probabilities p (ṯ; F), j = 1, …, 2 , ṯ ∈ ℝ 1 2 j are determined by the F measures of the corresponding sets. Then our model for multivariate current status data is the collection of all densities with respect to the dominating measure (counting measure on {0, 1} ) × μ given by +d +d for some distribution function F on ℝ where ṯ ∈ ℝ and γ ∈ {0, 1} with . Now the part of the log-likelihood that depends on F is given by and again the MLE F of the true distribution function F is given by n 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 5 (3.1) For example, when d = 2, we can write Γ = Δ Δ , Γ = (1 − Δ )Δ , Γ = Δ (1 − Δ ), and 1 1 2 2 1 2 3 1 2 Γ = (1 − Δ )(1 − Δ ), and then 4 1 2 Thus Note that (3.2) where Characterizations and computation of the MLE (3.1), mostly for the case d = 2 have been treated in Song (2001), Gentleman and Vandal (2002), and Maathuis (2005, 2006). Consistency of the MLE for more general interval censoring models has been established by Yu, Yu and Wong (2006). For an interesting application see Betensky and Finkelstein (1999). This example and other examples of multivariate interval censored data are treated in Sun (2006) and Deng and Fang (2009). For a comparison of the MLE with alternative estimators in the case d = 2, see Groeneboom (2012a). An analogue of Groeneboom’s Theorem 2.1 has not been established in the multivariate case. Song (2001) established an asymptotic minimax lower bound for pointwise convergence when d = 2: if F and G have positive continuous densities at ṯ , then no 0 0 0 −1/3 estimator has a local minimax rate for estimation of F (ṯ ) faster than n . By making use 0 0 of additional smoothness hypotheses, Groeneboom (2012a) has constructed estimators −1/3 which achieve the pointwise n rate, but it is not yet known if the MLE achieves this. Our main goal here is to prove the following theorem concerning the global rate of convergence of the MLE F . Theorem 3.1. Consider the multivariate current status model. Suppose that F has supp(F ) 0 0 ⊂ [0, M] and that F has density f which satisfies 0 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 6 (3.3) where 0 < c < ∞. Suppose that G has density g which satisfies 1 0 0 (3.4) Then the MLE p̂ ≡ p ̂ of p ≡ p satisfies n F 0 F n 0 for γ ≡ γ ≡ (5d − 4)/6. Since the inequality (2.3) continues to hold in ℝ for d ≥ 2 (with 1/4 replaced by 1/8 on the right side), we obtain the following corollary: Corollary 3.2. Under the conditions of Theorem 3.1 it follows that for β ≡ β = 2γ = (5d − 4)/3. d d 4. Proofs Here we give the proof of Theorem 3.1. The main tool is a method developed by van de Geer (2000). We will use the following lemma in combination with Theorem 7.6 of van de Geer (2000) or Theorem 3.4.1 of van der Vaart and Wellner (1996) (Section 3.4.2, pages 330–331).Without loss of generality we can take M = 1 where M is the upper bound of the support of F (see Theorem 3.1). Let be a collection of probability densities p on a sample space with respect to a dominating measure μ. Define (4.1) (4.2) (4.3) (conv) The following general result relating the bracketing entropies log N (·, , L (P )), [ ] 2 0 , log N (·, L (Q )), and log N (·, L (Q )) is due to van de [ ] 2 σ(ε) [ ] 2 σ(ε) Geer (2000). Lemma 4.1 (van de Geer, 2000). For every ε > 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 7 (4.4) (4.5) (4.6) where and Q ≡ Q /Q ( . σ σ σ Proof. We first show that (4.4) holds. Suppose that {[g , g ], j = 1, …, m} are ε-brackets L,j U,j with respect to L (P ) for with 2 0 (conv) Then for g ∈ , let g ≡ g1 be the corresponding element of . Suppose that σ [p >σ] g ∈ [g , g ] for some j ∈ {1, …, m}. Then σ L,j U,j (conv) where, by the triangle inequality, 0 ≤ g ≤ 2 for all g ∈ , and the definition of σ(ε), it follows that (conv) Thus {[g̃ , g̃ ] : j ∈ {1, …, m}} is a collection of 3ε–brackets for with respect to L,j U,j L (P ) and hence (4.4) holds. 2 0 Now we show that (4.5) holds. Suppose that {[p , p ] : j = 1, …, m} is a set of ε/2– L,j U,j brackets with respect to L (Q ) for with 2 σ Suppose p ∈ [p , p ] for some j. Then, since L,j U,j where Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 8 Thus and hence {[g , g ] : j = 1, …, m} is a set of ε-brackets with respect to L (P ) for . L,j U,j 2 0 This shows that (4.5) holds. It remains only to show that (4.6) holds. But this is easy since . This lemma is based on van de Geer (2000), pages 101 and 103. Note that our constants differ slightly from those of van de Geer. Lemma 4.2. Suppose that F has density f which satisfies, for some 0 < c < ∞, 0 0 1 (4.7) Then p (which we can identify with the vector p (·, F )) satisfies 0 0 0 Proof. This follows immediately from the general d version of (3.2) and the assumption on f . These inequalities can also be written in the following compact form: For with δ ∈ {0, 1}, Lemma 4.3. Suppose that the assumption of Lemma 4.2 holds. Suppose, moreover, that G has density g which satisfies (4.8) Then Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 9 2 d 2 Furthermore, with σ(δ) ≡ δ /(2 (c c ) ) we have 1 2 Proof. The first inequality follows easily from Lemma 4.2: note that The second inequality follows from the first inequality of the lemma. Lemma 4.4. If the hypotheses of Lemmas 4.2 and 4.3 hold, then the measure Q defined by dQ ≡ (1/p )1{p > σ}dμ has total mass Q ( given by σ 0 0 σ (4.9) (4.10) Proof. This follows from Lemma 4.2, followed by an explicit calculation. In particular, the equality in (4.10) follows from where the second equality follows by induction: it holds easily for d = 1 (and d = 2); and then an easy calculation shows that it holds for d if it holds for d − 1. Lemma 4.5. If the hypotheses of Lemmas 4.2 and 4.3 hold, and d ≥ 2, then for all 0 < ε < some ε and some constant K < ∞. Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 10 Proof. This follows by combining the results of Lemmas 4.3 and 4.4 with Lemma 4.1, and then using Corollary 6.2 of the bracketing entropy bound of Gao (2012) and stated here as Theorem 6.1. Here is the explicit calculation: for ε sufficiently small. Proof. (Theorem 3.1) This follows from Lemma 4.5 and Theorem 7.6 of van de Geer (2000) or Theorem 3.4.1 of van der Vaart and Wellner (1996) together with the arguments given in Section 3.4.2. By Lemma 4.5 the bracketing entropy integrals 1/2 where the bound on the right side behaves asymptotically as a constant times 2δ (log(1/ 3γ /2 δ)) with 3γ ≡ 5d/2 − 2, and hence (using the notation of Theorem 3.4.1 of van der 1/2 3γ /2 1/3 Vaart and Wellner (1996)), we can take ϕ (δ) = K2δ (log(1/δ)) . Thus with r ≡ n / n n (log n) with β = γ we find that and hence the claimed order of convergence holds. 5. Some related models and further problems There are several related models in which we expect to see the same basic phenomenon as −1/3 γ established here, namely a global convergence rate of the form n (log n) in all dimensions d ≥ 2 with only the power γ of the log term depending on d. Three such models are: a. the “in-out model” for interval censoring in ℝ ; b. the “case 2” multivariate interval censoring models studied by Deng and Fang (2009); and +d c. the scale mixture of uniforms model for decreasing densities in ℝ . Here we briefly sketch why we expect the same phenomenon to hold in these three cases, even though we do not yet know pointwise convergence rates in any of these cases. 5.1. The “in-out model” for interval censoring in ℝ The “in-out model” for interval censoring in ℝ was explored in the case d = 2 by Song 2 2 (2001). In this model Y̲ ~ F on ℝ , R is a random rectangle in ℝ independent of Y̲ (say [U̲, V̲ ] = {x̱ = (x , x ) ∈ ℝ : U ≤ x ≤ V , U ≤ x ≤ V } where U̲ and V̲ are random vectors 1 2 1 1 1 2 2 2 in ℝ with U̲ ≤ V̲ coordinatewise). We observe only (1 (Y̲), R), and the goal is to estimate the unknown distribution function F. Song (2001) (page 86) produced a local asymptotic minimax lower bound for estimation of F at a fixed ṯ ∈ ℝ . Under the assumption that F has a positive density f at ṯ , Song (2001) 0 0 showed that any estimator of F(ṯ ) can have a local-minimax convergence rate which is at Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 11 −1/3 best n . Groeneboom (2012a) has shown that this rate can be achieved by estimators involving smoothing methods. Based on the results for current status data in ℝ obtained in Theorem 3.1 and the entropy results for the class of distribution functions on ℝ , we ̂ −1/3 conjecture that the global Hellinger rate of convergence of the MLE F (ṯ ) will be n (log n 0 n) for all d ≥ 2 where ν = ν . 5.2. “Case 2” multivariate interval censoring models in ℝ Recall that “case 2” interval censored data on ℝ is as follows: suppose that Y̲ ~ F on ℝ , the pair of observation times (U, V) with U ≤ V determines a random interval (U, V], and we observe X̲ = (Δ̲, U, V) = (Δ , Δ , Δ , U, V) where Δ = 1{Y ≤ U}, Δ = 1{U < Y ≤ V}, 1 2 3 1 2 and Δ = 1{V < Y}. Nonparametric estimation of F based on X̲ , …, X̲ ) i.i.d. as X̲ has 3 0 1 n been discussed by a number of authors, including Groeneboom and Wellner (1992), Geskus and Groeneboom (1999), and Groeneboom (1996). Deng and Fang (2009) studied generalizations of this model to ℝ , and obtained rates of convergence of the MLE with −(1+d)/(2(1+2d) d /(2(2d+1) respect to the Hellinger metric given by n (log n) in the case most comparable to the multivariate interval censoring model studied here. While this rate −1/3 1/6 −1/3 ν reduces when d = 1 to the known rate n (log n) , it is slower than n (log n) for some ν when d > 1 due to the use of entropy bounds involving convex hulls (see Deng and Fang (2009), Proposition A.1, page 66) which are not necessarily sharp. We expect that rates of −1/3 ν the form n (log n) with ν > 0 are possible in these models as well. +d 5.3. Scale mixtures of uniform densities on ℝ Pavlides (2008) and Pavlides and Wellner (2012) studied the family of scale mixtures of uniform densities of the following form: (5.1) for some distribution function G on (0, ∞) . (Note that we have used the notation +d for y̲ = (y , …, y ) ∈ ℝ .) It is not difficult to see that such densities are 1 d decreasing in each coordinate and that they also satisfy +d for all u̲, υ̲ ∈ ℝ with u̲ ≤ υ̲; here Δ denotes the d–dimensional difference operator. This is the same key property of distribution functions which results in (bracketing) entropies which depend on dimension only through a logarithmic term. The difference here is that the density functions f need not be bounded, and even if the true density f is in this class and satisfies G 0 f (0̲) < ∞, then we do not yet know the behavior of the MLE F at zero. In fact we 0 n conjecture that: (a) If f (0̲) < ∞ and f is a scale mixture of uniform densities on rectangles 0 0 as in (5.1), then F (0̲) = O ((log n) ) for some β = β > 0. (b) Under the same hypothesis as n p d in (a) and the hypothesis that f has support contained in a compact set, the MLE converges −1/3 ξ with respect to the Hellinger distance with a rate that is no worse than n (log n) where ξ = ξ . Again Pavlides (2008) and Pavlides and Wellner (2012) establish asymptotic minimax lower bounds for estimation of f (x̲ ) proving that no estimator can have a (local minimax) 0 0 −1/3 rate of convergence faster than n in all dimensions. This is in sharp contrast to the class +d of block-decreasing densities on ℝ studied by Pavlides (2012) and by Biau and Devroye (2003): Pavlides (2012) shows that the local asymptotic minimax rate for estimation of Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 12 −1/(d+2) f (x ) is no faster than n , while Biau and Devroye (2003) show that there exist 0 0 ̃ ̃ −1/(d+2) (histogram type) estimators f which satisfy E ‖f − f ‖ = O(n ). n f n 0 1 Acknowledgments We owe thanks to the referees for a number of helpful suggestions and for pointing out the work of Yu, Yu and Wong (2006) and Deng and Fang (2009). Appendix We begin by summarizing the results of Gao (2012). For a (probability) measure μ on [0, 1] , let F ≡ F denote the corresponding distribution function given by for all x̲ = (x , …, x ) ∈ [0, 1] . Let ℱ denote the collection of all distribution functions on 1 d d [0, 1] ; i.e. For example, if λ denotes Lebesgue measure on [0, 1] , then the corresponding distribution function is . Theorem 6.1 (Gao, 2012). For d ≥ 2 and 1 ≤ p < ∞ for all 0 < ε ≤ 1. Our goal here is to use this result to control bracketing numbers for ℱ with respect to two other measures C and R defined as follows. Let C denote the finite measure on [0, 1] d d,σ d with density with respect to λ given by For fixed σ > 0, let R denote the (probability) measure on (0, 1] with density with d,σ respect to λ given by Corollary 6.2. (a) For each d ≥ 2 it follows that for ε ≤ ε (d) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 13 (b) For each d ≥ 2 and σ ≤ σ (d) it follows that for ε ≤ ε (d)/2 0 0 Proof. We first prove (a). We set p ≡ p = 2r ≡ 2r where r ≡ r = 2d − 1 and s = (d − 1/2)/(d d d d −1 −1 − 1) satisfy r + s = 1. Let {[g , h ], j = 1, …, m} be a collection of ε–brackets for ℱ j j d with respect to L (λ ). (Thus for d = 2, r = 3, s = 3/2, and p = 6, while for d = 4, r = 7, s = p d −1 2(d−1) (13/2)/3 = 13/6, and p = 14.) By Theorem A.1 we know that m ≲ ε (log(1/ε)) . Now we bound the size of the brackets [g , h ] with respect to C . Using Hölder’s inequality with j j d 1/r + 1/s = 1 as chosen above we find that (6.1) Here are some details of the computation leading to (6.1): To prove (b) we introduce monotone transformations t (u ) and their inverses u (t ) which j j j j relate c and r : we set d d,σ for j = 1, …, m. These all depend on σ > 0, but this dependence is suppressed in the notation. For the same brackets [g , h ] used in the proof of (a), we define new brackets [g̃ , h ] for j = j j j j 1, …, m by Then it follows easily by direct calculation using that Thus for σ ≤ σ (d) we have Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 14 d/2+1 by the arguments in (a). Hence the brackets [g̃ , h ] yield a collection of 2 ε–brackets for j j ℱ with respect to L (R ), and this implies that (b) holds. d 2 d,σ References Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 1955; 26:641–647. MR0073895 (17,504f). Balabdaoui, F.; Wellner, JA. Technical Report No. 512. Department of Statistics, University of Washington; 2012. Chernoff’s density is log-concave. available as arXiv:1207.6614. Betensky RA, Finkelstein DM. A non-parametric maximum likelihood estimator for bivariate interval- censored data. Statistics in Medicine. 1999; 18:3089–3010. [PubMed: 10544308] Biau G, Devroye L. On the risk of estimates for block decreasing densities. J. Multivariate Anal. 2003; 86:143–165. MR1994726 (2005c:62055). Deng D, Fang H-B. On nonparametric maximum likelihood estimations of multivariate distribution function based on interval-censored data. Comm. Statist. Theory Methods. 2009; 38:54–74. MR2489672 (2010j:62139). Dunson DB, Dinse GE. Bayesian models for multivariate current status data with informative censoring. Biometrics. 2002; 58:79–88. MR1891046. [PubMed: 11890330] Gao, F. Technical Report. Department of Mathematics, University of Idaho; 2012. Bracketing entropy of high dimensional distributions. “High Dimensional Probability VI”, to appear. Gentleman R, Vandal AC. Nonparametric estimation of the bivariate CDF for arbitrarily censored data. Canad. J. Statist. 2002; 30:557–571. MR1964427 (2004b:62090). Geskus R, Groeneboom P. Asymptotically optimal estimation of smooth functionals for interval censoring, case 2. Ann. Statist. 1999; 27:627–674. MR1714713 (2000j:60044). Groeneboom, P. Technical Report No. 87-18. Department of Mathematics, University of Amsterdam; 1987. Asymptotics for interval censored observations. Groeneboom P. Brownian motion with a parabolic drift and Airy functions. Probab. Theory Related Fields. 1989; 81:79–109. MR981568 (90c:60052). Groeneboom, P. Lectures on probability theory and statistics (Saint-Flour, 1994). Lecture Notes in Math. Vol. 1648. Berlin: Springer; 1996. Lectures on inverse problems; p. 67-164.MR1600884 (99c:62092) Groeneboom, P. Technical Report No. ??. Delft Institute of Applied Mathematics, Delft University of Technology; 2012a. The bivariate current status model. available as arXiv:1209.0542. Groeneboom, P. Technical Report No. ??. Delft Institute of Applied Mathematics, Delft University of Technology; 2012b. Local minimax lower bounds for the bivariate current status model. Personal communication. Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: consistency and rates of convergence of the MLE. Ann. Statist. 2008a; 36:1031–1063. Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: limiting distribution of the MLE. Ann. Statist. 2008b; 36:1064–1089. Groeneboom, P.; Wellner, JA. Information bounds and non-parametric maximum likelihood estimation. DMV Seminar. Vol. 19. Basel: Birkhäuser Verlag; 1992. MR1180321 (94k:62056) Groeneboom P, Wellner JA. Computing Chernoff’s distribution. J. Comput. Graph. Statist. 2001; 10:388–400. MR1939706. Jewell, NP. Advances in statistical modeling and inference. Ser. Biostat. Vol. 3. Hackensack, NJ: World Sci. Publ.; 2007. Correspondences between regression models for complex binary outcomes and those for structured multivariate survival analyses; p. 45-64.MR2416109 (2009e:62407) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 15 Lin X, Wang L. Bayesian proportional odds models for analyzing current status data: univariate, clustered, and multivariate. Comm. Statist. Simulation Comput. 2011; 40:1171–1181. MR2818097. Maathuis MH. Reduction algorithm for the NPMLE for the distribution function of bivariate interval- censored data. J. Comput. Graph. Statist. 2005; 14:352–362. MR2160818. Maathuis, MH. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). University of Washington; 2006. Nonparametric estimation for current status data with competing risks. MR2708977 Pavlides, MG. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). University of Washington; 2008. Nonparametric estimation of multivariate monotone densities. MR2717518 Pavlides MG. Local asymptotic minimax theory for block-decreasing densities. J. Statist. Plann. Inference. 2012; 142:2322–2329. MR2911847. Pavlides MG, Wellner JA. Nonparametric estimation of multivariate scale mixtures of uniform densities. J. Multivariate Anal. 2012; 107:71–89. MR2890434. Schick A, Yu Q. Consistency of the GMLE with mixed case interval-censored data. Scand. J. Statist. 2000; 27:45–55. MR1774042. Song, S. PhD thesis. University of Washington, Department of Statistics; 2001. Estimation with bivariate interval–censored data. Sun, J. The Statistical Analysis of Interval-censored Failure Time Data. Statistics for Biology and Health. New York: Springer; 2006. MR2287318 (2007h:62007) van de Geer S. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 1993; 21:14–44. MR1212164 (94c:62062). van de Geer, SA. Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 6. Cambridge: Cambridge University Press; 2000. MR1739079 (2001h:62002) van der Vaart, AW.; Wellner, JA. Weak Convergence and Empirical Processes. Springer Series in Statistics. New York: Springer-Verlag; 1996. MR1385671 (97g:60035) Wang, Y-F. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). Davis: University of California; 2009. Topics on multivariate two-stage current-status data and missing covariates in survival analysis. MR2736679 Yu S, Yu Q, Wong GYC. Consistency of the generalized MLE of a joint distribution function with multivariate interval-censored data. J. Multivariate Anal. 2006; 97:720–732. MR2236498 (2007i: 62068). Electron J Stat. Author manuscript; available in PMC 2013 August 27. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Electronic Journal of Statistics Unpaywall

# Global rates of convergence of the MLE for multivariate interval censoring

Electronic Journal of StatisticsJan 1, 2013
15 pages

/lp/unpaywall/global-rates-of-convergence-of-the-mle-for-multivariate-interval-iUcnEJKU1C

# References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Unpaywall
ISSN
1935-7524
DOI
10.1214/13-ejs777
Publisher site
See Article on Publisher Site

### Abstract

We establish global rates of convergence of the Maximum Likelihood Estimator (MLE) of a multivariate distribution function on ℝ in the case of (one type of) “interval censored” data. The main finding is that the rate of convergence of the MLE in the Hellinger metric is no worse than −1/3 γ n (log n) for γ = (5d − 4)/6. Keywords and phrases Empirical processes; global rate; Hellinger metric; interval censoring; multivariate; multivariate monotone functions 1. Introduction and overview Our main goal in this paper is to study global rates of convergence of the Maximum Likelihood Estimator (MLE) in one simple model for multivariate interval-censored data. In section 3 we will show that under some reasonable conditions the MLE converges in a d −1/3 γ Hellinger metric to the true distribution function on ℝ at a rate no worse than n (log n) for γ = (5d − 4)/6 for all d ≥ 2. Thus the rate of convergence is only worse than the known −1/3 rate of n for the case d = 1 by a factor involving a power of log n growing linearly with the dimension. These new rate results rely heavily on recent bracketing entropy bounds for d–dimensional distribution functions obtained by Gao (2012). We begin in Section 2 with a review of interval censoring problems and known results in the case d = 1. We introduce the multivariate interval censoring model of interest here in Section 3, and obtain a rate of convergence for this model for d ≥ 2 in Theorem 3.1. Most of the proofs are given in Section 4, with the exception being a key corollary of Gao (2012), the statement and proof of which are given in the Appendix (Section 6). Finally, in Section 5 we introduce several related models and further problems. 2. Interval censoring (or current status data) on ℝ + + Let Y ~ F on ℝ , and let T ~ G on ℝ be independent of Y. Suppose that we observe X , 0 0 1 …, X i.i.d. as X = (Δ, T) where Δ = 1 . Here Y is often the time until some event of n [Y ≤T] Supported in part by a grant from the Simons Foundation (#246211). Supported in part by NSF Grant DMS-1104832 and NI-AID grant 2R01 AI291968-04. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 2 interest and T is an observation time. The goal is to estimate F nonparametrically based on observation of the X ’s. To calculate the likelihood, we first calculate the distribution of X for a general distribution function F: note that the conditional distribution of Δ conditional on T is Bernoulli: where p(T) = F(T). If G has density g with respect to some measure μ on ℝ , then X = (Δ, 0 0 T) has density with respect to the dominating measure (counting measure on {0, 1}) × μ. The nonparametric Maximum Likelihood Estimator (MLE) F of F in this interval n 0 censoring model was first obtained by Ayer et al. (1955). It is simply described as follows: let T ≤ ⋯ ≤ T denote the order statistics corresponding to T , …, T and let Δ , …, (1) (n) 1 n (1) Δ denote the corresponding Δ’s. Then the part of the log-likelihood of X , …, X (n) 1 n depending on F is given by (2.1) where (2.2) It turns out that the maximizer F of (2.1) subject to (2.2) can be described as follows: let H* be the (greatest) convex minorant of the points {(i, ∑ Δ ) : i ∈ {1, …, n}}: j≤i (j) ̂ ̂ ̂ Let F denote the left-derivative of H* at T . Then (F , …, F ) is the unique vector i (i) 1 n maximizing (2.1) subject to (2.2), and we therefore take the MLE F of F to be with the conventions T ≡ 0 and T ≡ ∞. See Ayer et al. (1955) or Groeneboom and (0) (n+1) Wellner (1992), pages 38–43, for details. Groeneboom (1987) initiated the study of F and proved the following limiting distribution result at a fixed point t . Theorem 2.1 (Groeneboom, 1987). Consider the current status model on ℝ . Suppose that 0 < F (t ), G (t ) < 1 and suppose that F and G are differentiable at t with strictly positive 0 0 0 0 0 derivatives f (t ) and g (t ) respectively. Then 0 0 0 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 3 where and where W is a standard two-sided Brownian motion starting from 0. The distribution of ℤ has been studied in detail by Groeneboom (1989) and computed by Groeneboom and Wellner (2001). Balabdaoui and Wellner (2012) show that the density f of ℤ is log-concave. van de Geer (1993) (see also van de Geer (2000)) obtained the following global rate result for p . Recall that the Hellinger distance h(p, q) between two densities with respect to a dominating measure μ is given by −1/3 Proposition 2.2 (van de Geer, 1993). h(p , p ) = O (n ). F F p n 0 Now for any distribution functions F and F the (squared) Hellinger distance h (p , p ) for 0 F F the current status model is given by (2.3) and hence Proposition 2.2 yields (2.4) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 4 ̂ −1/3 or ‖F − F ‖ = O (n ). n 0 L (G ) p 2 0 For generalizations of these and other asymptotic results for the current status model to more complicated interval censoring schemes for real-valued random variables Y, see e.g. Groeneboom and Wellner (1992), van de Geer (1993), Groeneboom (1996), van de Geer (2000), Schick and Yu (2000), and Groeneboom, Maathuis and Wellner (2008a,b). Our main focus in this paper, however, concerns one simple generalization of the interval censoring model for ℝ introduced above to interval censoring in ℝ . We now turn to this generalization. 3. Multivariate interval censoring: multivariate current status data +d d +d Let Y̲ = (Y , …, Y ) ~ F on ℝ ≡ [0, ∞) , and let T̲ = (T , …, T ) ~ G on ℝ be 1 d 0 1 d 0 independent of Y̲. We assume that G has density g with respect to some dominating 0 0 measure μ on ℝ . Suppose we observe X̲ , …, X̲ i.i.d. as X̲ = (Δ̲, T̲) where Δ̲ = (Δ , …, 1 n 1 Δ ) is given by Δ = 1 , j = 1, …, d. Equivalently, with a slight abuse of notation, X̲ = d j [Y ≤T ] j j (Γ̲, T̲) where Γ̲ = (Γ , …, Γ ) is a vector of length 2 consisting of 0’s and 1’s and with at 1 2 d +d most one 1 which indicates which of the 2 orthants of ℝ determined by T̲ the random vector Y̲ belongs. More explicitly, define . Then set Γ ≡ 1{k = K} d d for k = 1, …, 2 , so that Γ = 1 and Γ = 0 for l ∈ {1, …, 2 } \ {K}. Much as for univariate K l current status data, Y̲ represents a vector of times to events, T̲ is a vector of observation times, and the goal is nonparametric estimation of the joint distribution function F of Y̲ based on observation of the X̲ ’s. See Dunson and Dinse (2002), Jewell (2007), Wang (2009), and Lin and Wang (2011) for examples of settings in which data of this type arises. To calculate the likelihood, we first calculate the distribution of X̲ for a general distribution function F: note that the conditional distribution of Γ̲; conditional on T̲ is Multinomial: d +d where p̲(T̲; F) = (p (T̲; F), …, p (T̲; F)) and the probabilities p (ṯ; F), j = 1, …, 2 , ṯ ∈ ℝ 1 2 j are determined by the F measures of the corresponding sets. Then our model for multivariate current status data is the collection of all densities with respect to the dominating measure (counting measure on {0, 1} ) × μ given by +d +d for some distribution function F on ℝ where ṯ ∈ ℝ and γ ∈ {0, 1} with . Now the part of the log-likelihood that depends on F is given by and again the MLE F of the true distribution function F is given by n 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 5 (3.1) For example, when d = 2, we can write Γ = Δ Δ , Γ = (1 − Δ )Δ , Γ = Δ (1 − Δ ), and 1 1 2 2 1 2 3 1 2 Γ = (1 − Δ )(1 − Δ ), and then 4 1 2 Thus Note that (3.2) where Characterizations and computation of the MLE (3.1), mostly for the case d = 2 have been treated in Song (2001), Gentleman and Vandal (2002), and Maathuis (2005, 2006). Consistency of the MLE for more general interval censoring models has been established by Yu, Yu and Wong (2006). For an interesting application see Betensky and Finkelstein (1999). This example and other examples of multivariate interval censored data are treated in Sun (2006) and Deng and Fang (2009). For a comparison of the MLE with alternative estimators in the case d = 2, see Groeneboom (2012a). An analogue of Groeneboom’s Theorem 2.1 has not been established in the multivariate case. Song (2001) established an asymptotic minimax lower bound for pointwise convergence when d = 2: if F and G have positive continuous densities at ṯ , then no 0 0 0 −1/3 estimator has a local minimax rate for estimation of F (ṯ ) faster than n . By making use 0 0 of additional smoothness hypotheses, Groeneboom (2012a) has constructed estimators −1/3 which achieve the pointwise n rate, but it is not yet known if the MLE achieves this. Our main goal here is to prove the following theorem concerning the global rate of convergence of the MLE F . Theorem 3.1. Consider the multivariate current status model. Suppose that F has supp(F ) 0 0 ⊂ [0, M] and that F has density f which satisfies 0 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 6 (3.3) where 0 < c < ∞. Suppose that G has density g which satisfies 1 0 0 (3.4) Then the MLE p̂ ≡ p ̂ of p ≡ p satisfies n F 0 F n 0 for γ ≡ γ ≡ (5d − 4)/6. Since the inequality (2.3) continues to hold in ℝ for d ≥ 2 (with 1/4 replaced by 1/8 on the right side), we obtain the following corollary: Corollary 3.2. Under the conditions of Theorem 3.1 it follows that for β ≡ β = 2γ = (5d − 4)/3. d d 4. Proofs Here we give the proof of Theorem 3.1. The main tool is a method developed by van de Geer (2000). We will use the following lemma in combination with Theorem 7.6 of van de Geer (2000) or Theorem 3.4.1 of van der Vaart and Wellner (1996) (Section 3.4.2, pages 330–331).Without loss of generality we can take M = 1 where M is the upper bound of the support of F (see Theorem 3.1). Let be a collection of probability densities p on a sample space with respect to a dominating measure μ. Define (4.1) (4.2) (4.3) (conv) The following general result relating the bracketing entropies log N (·, , L (P )), [ ] 2 0 , log N (·, L (Q )), and log N (·, L (Q )) is due to van de [ ] 2 σ(ε) [ ] 2 σ(ε) Geer (2000). Lemma 4.1 (van de Geer, 2000). For every ε > 0 Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 7 (4.4) (4.5) (4.6) where and Q ≡ Q /Q ( . σ σ σ Proof. We first show that (4.4) holds. Suppose that {[g , g ], j = 1, …, m} are ε-brackets L,j U,j with respect to L (P ) for with 2 0 (conv) Then for g ∈ , let g ≡ g1 be the corresponding element of . Suppose that σ [p >σ] g ∈ [g , g ] for some j ∈ {1, …, m}. Then σ L,j U,j (conv) where, by the triangle inequality, 0 ≤ g ≤ 2 for all g ∈ , and the definition of σ(ε), it follows that (conv) Thus {[g̃ , g̃ ] : j ∈ {1, …, m}} is a collection of 3ε–brackets for with respect to L,j U,j L (P ) and hence (4.4) holds. 2 0 Now we show that (4.5) holds. Suppose that {[p , p ] : j = 1, …, m} is a set of ε/2– L,j U,j brackets with respect to L (Q ) for with 2 σ Suppose p ∈ [p , p ] for some j. Then, since L,j U,j where Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 8 Thus and hence {[g , g ] : j = 1, …, m} is a set of ε-brackets with respect to L (P ) for . L,j U,j 2 0 This shows that (4.5) holds. It remains only to show that (4.6) holds. But this is easy since . This lemma is based on van de Geer (2000), pages 101 and 103. Note that our constants differ slightly from those of van de Geer. Lemma 4.2. Suppose that F has density f which satisfies, for some 0 < c < ∞, 0 0 1 (4.7) Then p (which we can identify with the vector p (·, F )) satisfies 0 0 0 Proof. This follows immediately from the general d version of (3.2) and the assumption on f . These inequalities can also be written in the following compact form: For with δ ∈ {0, 1}, Lemma 4.3. Suppose that the assumption of Lemma 4.2 holds. Suppose, moreover, that G has density g which satisfies (4.8) Then Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 9 2 d 2 Furthermore, with σ(δ) ≡ δ /(2 (c c ) ) we have 1 2 Proof. The first inequality follows easily from Lemma 4.2: note that The second inequality follows from the first inequality of the lemma. Lemma 4.4. If the hypotheses of Lemmas 4.2 and 4.3 hold, then the measure Q defined by dQ ≡ (1/p )1{p > σ}dμ has total mass Q ( given by σ 0 0 σ (4.9) (4.10) Proof. This follows from Lemma 4.2, followed by an explicit calculation. In particular, the equality in (4.10) follows from where the second equality follows by induction: it holds easily for d = 1 (and d = 2); and then an easy calculation shows that it holds for d if it holds for d − 1. Lemma 4.5. If the hypotheses of Lemmas 4.2 and 4.3 hold, and d ≥ 2, then for all 0 < ε < some ε and some constant K < ∞. Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 10 Proof. This follows by combining the results of Lemmas 4.3 and 4.4 with Lemma 4.1, and then using Corollary 6.2 of the bracketing entropy bound of Gao (2012) and stated here as Theorem 6.1. Here is the explicit calculation: for ε sufficiently small. Proof. (Theorem 3.1) This follows from Lemma 4.5 and Theorem 7.6 of van de Geer (2000) or Theorem 3.4.1 of van der Vaart and Wellner (1996) together with the arguments given in Section 3.4.2. By Lemma 4.5 the bracketing entropy integrals 1/2 where the bound on the right side behaves asymptotically as a constant times 2δ (log(1/ 3γ /2 δ)) with 3γ ≡ 5d/2 − 2, and hence (using the notation of Theorem 3.4.1 of van der 1/2 3γ /2 1/3 Vaart and Wellner (1996)), we can take ϕ (δ) = K2δ (log(1/δ)) . Thus with r ≡ n / n n (log n) with β = γ we find that and hence the claimed order of convergence holds. 5. Some related models and further problems There are several related models in which we expect to see the same basic phenomenon as −1/3 γ established here, namely a global convergence rate of the form n (log n) in all dimensions d ≥ 2 with only the power γ of the log term depending on d. Three such models are: a. the “in-out model” for interval censoring in ℝ ; b. the “case 2” multivariate interval censoring models studied by Deng and Fang (2009); and +d c. the scale mixture of uniforms model for decreasing densities in ℝ . Here we briefly sketch why we expect the same phenomenon to hold in these three cases, even though we do not yet know pointwise convergence rates in any of these cases. 5.1. The “in-out model” for interval censoring in ℝ The “in-out model” for interval censoring in ℝ was explored in the case d = 2 by Song 2 2 (2001). In this model Y̲ ~ F on ℝ , R is a random rectangle in ℝ independent of Y̲ (say [U̲, V̲ ] = {x̱ = (x , x ) ∈ ℝ : U ≤ x ≤ V , U ≤ x ≤ V } where U̲ and V̲ are random vectors 1 2 1 1 1 2 2 2 in ℝ with U̲ ≤ V̲ coordinatewise). We observe only (1 (Y̲), R), and the goal is to estimate the unknown distribution function F. Song (2001) (page 86) produced a local asymptotic minimax lower bound for estimation of F at a fixed ṯ ∈ ℝ . Under the assumption that F has a positive density f at ṯ , Song (2001) 0 0 showed that any estimator of F(ṯ ) can have a local-minimax convergence rate which is at Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 11 −1/3 best n . Groeneboom (2012a) has shown that this rate can be achieved by estimators involving smoothing methods. Based on the results for current status data in ℝ obtained in Theorem 3.1 and the entropy results for the class of distribution functions on ℝ , we ̂ −1/3 conjecture that the global Hellinger rate of convergence of the MLE F (ṯ ) will be n (log n 0 n) for all d ≥ 2 where ν = ν . 5.2. “Case 2” multivariate interval censoring models in ℝ Recall that “case 2” interval censored data on ℝ is as follows: suppose that Y̲ ~ F on ℝ , the pair of observation times (U, V) with U ≤ V determines a random interval (U, V], and we observe X̲ = (Δ̲, U, V) = (Δ , Δ , Δ , U, V) where Δ = 1{Y ≤ U}, Δ = 1{U < Y ≤ V}, 1 2 3 1 2 and Δ = 1{V < Y}. Nonparametric estimation of F based on X̲ , …, X̲ ) i.i.d. as X̲ has 3 0 1 n been discussed by a number of authors, including Groeneboom and Wellner (1992), Geskus and Groeneboom (1999), and Groeneboom (1996). Deng and Fang (2009) studied generalizations of this model to ℝ , and obtained rates of convergence of the MLE with −(1+d)/(2(1+2d) d /(2(2d+1) respect to the Hellinger metric given by n (log n) in the case most comparable to the multivariate interval censoring model studied here. While this rate −1/3 1/6 −1/3 ν reduces when d = 1 to the known rate n (log n) , it is slower than n (log n) for some ν when d > 1 due to the use of entropy bounds involving convex hulls (see Deng and Fang (2009), Proposition A.1, page 66) which are not necessarily sharp. We expect that rates of −1/3 ν the form n (log n) with ν > 0 are possible in these models as well. +d 5.3. Scale mixtures of uniform densities on ℝ Pavlides (2008) and Pavlides and Wellner (2012) studied the family of scale mixtures of uniform densities of the following form: (5.1) for some distribution function G on (0, ∞) . (Note that we have used the notation +d for y̲ = (y , …, y ) ∈ ℝ .) It is not difficult to see that such densities are 1 d decreasing in each coordinate and that they also satisfy +d for all u̲, υ̲ ∈ ℝ with u̲ ≤ υ̲; here Δ denotes the d–dimensional difference operator. This is the same key property of distribution functions which results in (bracketing) entropies which depend on dimension only through a logarithmic term. The difference here is that the density functions f need not be bounded, and even if the true density f is in this class and satisfies G 0 f (0̲) < ∞, then we do not yet know the behavior of the MLE F at zero. In fact we 0 n conjecture that: (a) If f (0̲) < ∞ and f is a scale mixture of uniform densities on rectangles 0 0 as in (5.1), then F (0̲) = O ((log n) ) for some β = β > 0. (b) Under the same hypothesis as n p d in (a) and the hypothesis that f has support contained in a compact set, the MLE converges −1/3 ξ with respect to the Hellinger distance with a rate that is no worse than n (log n) where ξ = ξ . Again Pavlides (2008) and Pavlides and Wellner (2012) establish asymptotic minimax lower bounds for estimation of f (x̲ ) proving that no estimator can have a (local minimax) 0 0 −1/3 rate of convergence faster than n in all dimensions. This is in sharp contrast to the class +d of block-decreasing densities on ℝ studied by Pavlides (2012) and by Biau and Devroye (2003): Pavlides (2012) shows that the local asymptotic minimax rate for estimation of Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 12 −1/(d+2) f (x ) is no faster than n , while Biau and Devroye (2003) show that there exist 0 0 ̃ ̃ −1/(d+2) (histogram type) estimators f which satisfy E ‖f − f ‖ = O(n ). n f n 0 1 Acknowledgments We owe thanks to the referees for a number of helpful suggestions and for pointing out the work of Yu, Yu and Wong (2006) and Deng and Fang (2009). Appendix We begin by summarizing the results of Gao (2012). For a (probability) measure μ on [0, 1] , let F ≡ F denote the corresponding distribution function given by for all x̲ = (x , …, x ) ∈ [0, 1] . Let ℱ denote the collection of all distribution functions on 1 d d [0, 1] ; i.e. For example, if λ denotes Lebesgue measure on [0, 1] , then the corresponding distribution function is . Theorem 6.1 (Gao, 2012). For d ≥ 2 and 1 ≤ p < ∞ for all 0 < ε ≤ 1. Our goal here is to use this result to control bracketing numbers for ℱ with respect to two other measures C and R defined as follows. Let C denote the finite measure on [0, 1] d d,σ d with density with respect to λ given by For fixed σ > 0, let R denote the (probability) measure on (0, 1] with density with d,σ respect to λ given by Corollary 6.2. (a) For each d ≥ 2 it follows that for ε ≤ ε (d) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 13 (b) For each d ≥ 2 and σ ≤ σ (d) it follows that for ε ≤ ε (d)/2 0 0 Proof. We first prove (a). We set p ≡ p = 2r ≡ 2r where r ≡ r = 2d − 1 and s = (d − 1/2)/(d d d d −1 −1 − 1) satisfy r + s = 1. Let {[g , h ], j = 1, …, m} be a collection of ε–brackets for ℱ j j d with respect to L (λ ). (Thus for d = 2, r = 3, s = 3/2, and p = 6, while for d = 4, r = 7, s = p d −1 2(d−1) (13/2)/3 = 13/6, and p = 14.) By Theorem A.1 we know that m ≲ ε (log(1/ε)) . Now we bound the size of the brackets [g , h ] with respect to C . Using Hölder’s inequality with j j d 1/r + 1/s = 1 as chosen above we find that (6.1) Here are some details of the computation leading to (6.1): To prove (b) we introduce monotone transformations t (u ) and their inverses u (t ) which j j j j relate c and r : we set d d,σ for j = 1, …, m. These all depend on σ > 0, but this dependence is suppressed in the notation. For the same brackets [g , h ] used in the proof of (a), we define new brackets [g̃ , h ] for j = j j j j 1, …, m by Then it follows easily by direct calculation using that Thus for σ ≤ σ (d) we have Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 14 d/2+1 by the arguments in (a). Hence the brackets [g̃ , h ] yield a collection of 2 ε–brackets for j j ℱ with respect to L (R ), and this implies that (b) holds. d 2 d,σ References Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 1955; 26:641–647. MR0073895 (17,504f). Balabdaoui, F.; Wellner, JA. Technical Report No. 512. Department of Statistics, University of Washington; 2012. Chernoff’s density is log-concave. available as arXiv:1207.6614. Betensky RA, Finkelstein DM. A non-parametric maximum likelihood estimator for bivariate interval- censored data. Statistics in Medicine. 1999; 18:3089–3010. [PubMed: 10544308] Biau G, Devroye L. On the risk of estimates for block decreasing densities. J. Multivariate Anal. 2003; 86:143–165. MR1994726 (2005c:62055). Deng D, Fang H-B. On nonparametric maximum likelihood estimations of multivariate distribution function based on interval-censored data. Comm. Statist. Theory Methods. 2009; 38:54–74. MR2489672 (2010j:62139). Dunson DB, Dinse GE. Bayesian models for multivariate current status data with informative censoring. Biometrics. 2002; 58:79–88. MR1891046. [PubMed: 11890330] Gao, F. Technical Report. Department of Mathematics, University of Idaho; 2012. Bracketing entropy of high dimensional distributions. “High Dimensional Probability VI”, to appear. Gentleman R, Vandal AC. Nonparametric estimation of the bivariate CDF for arbitrarily censored data. Canad. J. Statist. 2002; 30:557–571. MR1964427 (2004b:62090). Geskus R, Groeneboom P. Asymptotically optimal estimation of smooth functionals for interval censoring, case 2. Ann. Statist. 1999; 27:627–674. MR1714713 (2000j:60044). Groeneboom, P. Technical Report No. 87-18. Department of Mathematics, University of Amsterdam; 1987. Asymptotics for interval censored observations. Groeneboom P. Brownian motion with a parabolic drift and Airy functions. Probab. Theory Related Fields. 1989; 81:79–109. MR981568 (90c:60052). Groeneboom, P. Lectures on probability theory and statistics (Saint-Flour, 1994). Lecture Notes in Math. Vol. 1648. Berlin: Springer; 1996. Lectures on inverse problems; p. 67-164.MR1600884 (99c:62092) Groeneboom, P. Technical Report No. ??. Delft Institute of Applied Mathematics, Delft University of Technology; 2012a. The bivariate current status model. available as arXiv:1209.0542. Groeneboom, P. Technical Report No. ??. Delft Institute of Applied Mathematics, Delft University of Technology; 2012b. Local minimax lower bounds for the bivariate current status model. Personal communication. Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: consistency and rates of convergence of the MLE. Ann. Statist. 2008a; 36:1031–1063. Groeneboom P, Maathuis MH, Wellner JA. Current status data with competing risks: limiting distribution of the MLE. Ann. Statist. 2008b; 36:1064–1089. Groeneboom, P.; Wellner, JA. Information bounds and non-parametric maximum likelihood estimation. DMV Seminar. Vol. 19. Basel: Birkhäuser Verlag; 1992. MR1180321 (94k:62056) Groeneboom P, Wellner JA. Computing Chernoff’s distribution. J. Comput. Graph. Statist. 2001; 10:388–400. MR1939706. Jewell, NP. Advances in statistical modeling and inference. Ser. Biostat. Vol. 3. Hackensack, NJ: World Sci. Publ.; 2007. Correspondences between regression models for complex binary outcomes and those for structured multivariate survival analyses; p. 45-64.MR2416109 (2009e:62407) Electron J Stat. Author manuscript; available in PMC 2013 August 27. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript Gao and Wellner Page 15 Lin X, Wang L. Bayesian proportional odds models for analyzing current status data: univariate, clustered, and multivariate. Comm. Statist. Simulation Comput. 2011; 40:1171–1181. MR2818097. Maathuis MH. Reduction algorithm for the NPMLE for the distribution function of bivariate interval- censored data. J. Comput. Graph. Statist. 2005; 14:352–362. MR2160818. Maathuis, MH. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). University of Washington; 2006. Nonparametric estimation for current status data with competing risks. MR2708977 Pavlides, MG. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). University of Washington; 2008. Nonparametric estimation of multivariate monotone densities. MR2717518 Pavlides MG. Local asymptotic minimax theory for block-decreasing densities. J. Statist. Plann. Inference. 2012; 142:2322–2329. MR2911847. Pavlides MG, Wellner JA. Nonparametric estimation of multivariate scale mixtures of uniform densities. J. Multivariate Anal. 2012; 107:71–89. MR2890434. Schick A, Yu Q. Consistency of the GMLE with mixed case interval-censored data. Scand. J. Statist. 2000; 27:45–55. MR1774042. Song, S. PhD thesis. University of Washington, Department of Statistics; 2001. Estimation with bivariate interval–censored data. Sun, J. The Statistical Analysis of Interval-censored Failure Time Data. Statistics for Biology and Health. New York: Springer; 2006. MR2287318 (2007h:62007) van de Geer S. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 1993; 21:14–44. MR1212164 (94c:62062). van de Geer, SA. Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 6. Cambridge: Cambridge University Press; 2000. MR1739079 (2001h:62002) van der Vaart, AW.; Wellner, JA. Weak Convergence and Empirical Processes. Springer Series in Statistics. New York: Springer-Verlag; 1996. MR1385671 (97g:60035) Wang, Y-F. ProQuest LLC, Ann Arbor, MI Thesis (Ph.D.). Davis: University of California; 2009. Topics on multivariate two-stage current-status data and missing covariates in survival analysis. MR2736679 Yu S, Yu Q, Wong GYC. Consistency of the generalized MLE of a joint distribution function with multivariate interval-censored data. J. Multivariate Anal. 2006; 97:720–732. MR2236498 (2007i: 62068). Electron J Stat. Author manuscript; available in PMC 2013 August 27.

### Journal

Electronic Journal of StatisticsUnpaywall

Published: Jan 1, 2013