Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution and Its Application to Hotelling’s T Statistic

Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution... Article Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution and Its Application to Hotelling’s T Statistic 1,2 Frédéric Ouimet Department of Mathematics and Statistics, McGill University, Montreal, QC H3A 0B9, Canada; frederic.ouimet2@mcgill.ca Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125, USA Abstract: In this paper, we develop local expansions for the ratio of the centered matrix-variate T density to the centered matrix-variate normal density with the same covariances. The approximations are used to derive upper bounds on several probability metrics (such as the total variation and Hellinger distance) between the corresponding induced measures. This work extends some previous results for the univariate Student distribution to the matrix-variate setting. Keywords: asymptotic statistics; expansion; Hotelling’s T-squared statistic; Hotelling’s T statis- tic; matrix-variate normal distribution; local approximation; matrix-variate T distribution; normal approximation; Student distribution; T distribution; total variation MSC: Primary: 62E20; Secondary: 60F99 Citation: Ouimet, F. Local Normal 1. Introduction Approximations and Probability Metric Bounds for the Matrix-Variate For any n 2 N, define the space of (real symmetric) positive definite matrices of size T Distribution and Its Application to n n as follows: Hotelling’s T Statistic. AppliedMath n nn 2022, 2, 446–456. https://doi.org/ : S = M 2 R : M is symmetric and positive definite . ++ 10.3390/appliedmath2030025 dm d m For d, m 2 N, n > 0, M 2 R , S 2 S and W 2 S , the density function of the ++ ++ Academic Editor: Tommi Sottinen centered (and normalized) matrix-variate T distribution, hereafter denoted by T (n, S, W), d,m dm Received: 22 June 2022 is defined, for all X 2 R , by Accepted: 21 July 2022 1 1 1 > (n+m+d1)/2 Published: 1 August 2022 G ( (n + m + d 1)) jI + n S XW X j K (X) = , (1) n,S,W 1 md/2 m/2 d/2 (np) jSj jWj G ( (n + d 1)) Publisher’s Note: MDPI stays neutral d with regard to jurisdictional claims in (see, e.g., (Definition 4.2.1 in [1])) where n is the number of degrees of freedom, and published maps and institutional affil- iations. z(d+1)/2 G (z) = jSj exp(tr(S))dS S2S ++ j 1 d 1 d(d1)/4 = p G z , <(z) > , Copyright: © 2022 by the authors. 2 2 j=1 Licensee MDPI, Basel, Switzerland. This article is an open access article denotes the multivariate gamma function—see, e.g., (Section 35.3 in [2]) and [3]—and distributed under the terms and conditions of the Creative Commons z1 t G(z) = t e dt, <(z) > 0, Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). AppliedMath 2022, 2, 446–456. https://doi.org/10.3390/appliedmath2030025 https://www.mdpi.com/journal/appliedmath AppliedMath 2022, 2 447 is the classical gamma function. The mean and covariance matrix for the vectorization of T  T (n, S, W), namely d,m vec(T) := (T , T , . . . , T , T , T , . . . , T , . . . , T , T , . . . , T ) , 11 21 d1 12 22 d2 1m 2m dm (vec() is the operator that stacks the columns of a matrix on top of each other) are known to be (see, e.g., Theorem 4.3.1 in [1], but be careful of the normalization): E[vec(T)] = 0 (i.e., E[T] = 0 ), dm dm and Var(vec(T )) = S W, n > 2. (n 2) The first goal of our paper (Theorem 1) is to establish an asymptotic expansion for the ratio of the centered matrix-variate T density (1) to the centered matrix-variate normal (MN) density with the same covariances. According to (Gupta and Nagar [1], Theorem 2.2.1), the density of the MN (0 , S W) distribution is d,m dm 1 1 > exp tr S XW X dm g (X) = , X 2 R . (2) S,W md/2 m/2 d/2 (2p) jSj jWj The second goal of our paper (Theorem 2) is to apply the log-ratio expansion from Theorem 1 to derive upper bounds on multiple probability metrics between the measures induced by the centered matrix-variate T distribution and the corresponding centered matrix-variate normal distribution. In the special case m = 1, this gives us probability metric upper bounds between the measure induced by Hotelling’s T statistic and the associated matrix-normal measure. To give some practical motivations for the MN distribution (2), note that noise in the estimate of individual voxels of diffusion tensor magnetic resonance imaging (DT-MRI) data has been shown to be well modeled by a symmetric form of the MN distribution 33 in [4–6]. The symmetric MN voxel distributions were combined into a tensor-variate normal distribution in [7,8], which could help to predict how the whole image (not just individual voxels) changes when shearing and dilation operations are applied in image wearing and registration problems; see Alexander et al. [9]. In [10], maximum likelihood estimators and likelihood ratio tests are developed for the eigenvalues and eigenvectors of a form of the symmetric MN distribution with an orthogonally invariant covariance structure, both in one-sample problems (for example, in image interpolation) and two-sample problems (when comparing images) and under a broad variety of assumptions. This work extended significantly the previous results of Mallows [11]. In [10], it is also mentioned that the polarization pattern of cosmic microwave background (CMB) radiation measurements can be represented by 2 2 positive definite matrices; see the primer by Hu and White [12]. In a very recent and interesting paper, Vafaei Sadr and Movahed [13] presented evidence for the Gaussianity of the local extrema of CMB maps. We can also mention [14], where finite mixtures of skewed MN distributions were applied to an image recognition problem. In general, we know that the Gaussian distribution is an attractor for sums of i.i.d. random variables with finite variance, which makes many estimators in statistics asymptoti- cally normal. Similarly, we expect the MN distribution (2) to be an attractor for sums of i.i.d. random matrices with finite variances (Hotelling’s T-squared statistic is the most natural example), thus including many estimators, such as sample covariance matrices and score statistics for matrix parameters. In particular, if a given statistic or estimator is a function of the components of a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population, then we could study its large sample properties (such as its moments) using Theorem 1 (for example, by turning a Student-moments estimation problem into a Gaussian-moments estimation problem). AppliedMath 2022, 2 448 The following is a brief outline of the paper. Our main results are stated in Section 2 and proven in Section 3. Technical moment calculations are gathered in Appendix A. Notation 1. Throughout the paper, a = O(b) means that lim supja/bj < C as n ! ¥, where C > 0 is a universal constant. Whenever C might depend on some parameter, we add a subscript (for example, a = O (b)). Similarly, a = o(b) means that limja/bj = 0, and subscripts indicate which parameters the convergence rate can depend on. If a = (1 + o(1))b, then we write a  b. The notation tr() will denote the trace operator for matrices andjj their determinant. For a matrix dd M 2 R that is diagonalizable, l (M)    l (M) will denote its eigenvalues, and we let 1 d l(M) := (l (M), . . . , l (M)) . 1 d 2. Main Results In Theorem 1 below, we prove an asymptotic expansion for the ratio of the centered matrix-variate T density to the centered matrix-variate normal (MN) density with the same covariances. The case d = m = 1 was proven recently in [15] (see also [16] for an earlier rougher version). The result extends significantly the convergence in distribution result from Theorem 4.3.4 in [1]. d m Theorem 1. Let d, m 2 N, S 2 S and W 2 S be given. Pick any h 2 (0, 1) and let ++ ++ ( ) dm 1/4 : p B (h) = X 2 R : max  h n n,S,W 1jd n 2 denote the bulk of the centered matrix-variate T distribution, where n 2 1/2 1/2 : : D = S XW and d = l (D D ), 1  j  d. X X l j j X Then, as n ! ¥ and uniformly for X 2 B (h), we have n,S,W md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W ( ) (m+d+1) 1 > 2 > tr (D D ) tr D D 1 X X X X 4 2 = n md(m+d+1) ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 6 X 4 X + n md 2 2 (3) + (13 2d 3d(3 + m) + 9m 2m ) 8 9 (m+d1) 1 > 4 > 3 > > tr (D D ) tr (D D ) < X X = 8 X 6 X 3 2 + n 26 + d + 2d (3 + m) + 11m md > > : ; 2 3 2 6m + m + d(11 9m + 2m ) > 5 1 + tr (D D ) +O . d,m,h Local approximations such as the one in Theorem 1 can be found for the Poisson, binomial and negative binomial distributions in [17] (based on Fourier analysis results from [18]), and [19] for the binomial distribution. Another approach, using Stein’s method, is used to study the variance-gamma distribution in [20]. Moreover, Kolmogorov and Wasserstein distance bounds are derived in [21,22] for the Laplace and variance-gamma distributions. AppliedMath 2022, 2 449 Below, we provide numerical evidence (displayed graphically) for the validity of the expansion in Theorem 1 when d = m = 2. We compare three levels of approximation for various choices of S. For any given S 2 S , define ++ md/2 [n/(n 2)] K (X) n,S,W E := sup log p , (4) g (X/ n/(n 2)) 1/4 S,W X2B (n ) n,S,W md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + , X X X X 4 2 4 md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + X X X X 4 2 4 ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 X X 6 4 n . md 2 2 + (13 2d 3d(3 + m) + 9m 2m ) In the R software [23], we use Equation (7) to evaluate the log-ratios inside E , E and 0 1 E . 1/4 > k Note that X 2 B (n ) implies jtr((D D ) )j  d for all k 2 N, so we expect n,S,W X from Theorem 1 that the maximum errors above (E , E and E ) will have the asymptotic 0 1 2 behavior (1+i) E = O (n ), for all i 2 f0, 1, 2g, or, equivalently, log E lim inf  1 + i, for all i 2 f0, 1, 2g. (5) n!¥ log(n ) The property (5) is verified in Figure 1 below, for W = I and various choices of S . 2 22 Similarly, the corresponding log-log plots of the errors as a function of n are displayed in Figure 2. The simulations are limited to the range 5  n  1005. The R code that generated Figures 1 and 2 can be found at Supplementary Material. As a consequence of the previous theorem, we can derive asymptotic upper bounds on several probability metrics between the probability measures induced by the centered matrix-variate T distribution (1) and the corresponding centered matrix-variate normal distribution (2). The distance between Hotelling’s T statistic [24] and the corresponding matrix-variate normal distribution is obtained in the special case m = 1. d m Theorem 2 (Probability metric upper bounds). Let d, m 2 N, S 2 S and W 2 S be ++ ++ given. Assume that X  T (n, S, W), Y  MN (0 , S W), and let P and Q be d,m d,m dm n,S,W S,W the laws of X and Y n/(n 2), respectively. Then, as n ! ¥, 3/2 3/2 C(md) 2C(md) dist(P ,Q )  and H(P ,Q )  , n,S,W S,W n,S,W S,W n n where C > 0 is a universal constant, H(,) denotes the Hellinger distance, and dist(,) can be replaced by any of the following probability metrics: total variation, Kolmogorov (or uniform) metric, Lévy metric, discrepancy metric, Prokhorov metric. AppliedMath 2022, 2 450 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) −1 −1 −1 −1 ● ● ● ● log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) −1 −1 −1 −1 log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 1. Plots of log E / log(n ) as a function of n, for various choices of S. The plots confirm (5) for our choices of S and bring strong evidence for the validity of Theorem 1. 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E 1 E 1 E 1 E 0 0 0 0 ● 1 E ● 1 E ● 1 E ● 1 E 1 1 1 1 1 E 1 E 1 E 1 E 2 2 2 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 2. Plots of 1/E as a function of n, for various choices of S. Both the horizontal and vertical axes are on a logarithmic scale. The plots clearly illustrate how the addition of correction terms from Theorem 1 to the base approximation (4) improves it. 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 AppliedMath 2022, 2 451 3. Proofs Proof of Theorem 1. First, we take the expression in (1) over the one in (2): md/2 [n/(n 2)] K (X) n,S,W g (X/ n/(n 2)) S,W md/2 d 1 2 G( (n + m + d j)) (6) n 2 G( (n + d j)) j=1 (n+m+d1)/2 (n 2) > 1 > exp tr D D I + n D D . X d X X X 2n The last determinant was obtained using the fact that the eigenvalues of a product of rectangular matrices are invariant under cyclic permutations (as long as the products remain well defined). Indeed, for all j 2 f1, 2, . . . , dg, we have 1 1 1 > 1 1 1 > l (I + n S XW X ) = 1 + n l (S XW X ) j j 1 > 1 > = 1 + n l (D D ) = l (I + n D D ). j X j d X X X By taking the logarithm on both sides of (6), we get md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d h i md n 2 1 1 = log + log G( (n + m + d j)) log G( (n + d j)) å (7) 2 2 2 2 j=1 0 1 d d 1 (n + m + d 1) @ A + d log 1 + . å l å 2 2 n 2 j=1 j=1 By applying the Taylor expansions, 1 1 log G( (n + m + d j)) log G( (n + d j)) 2 2 1 1 1 1 = (n + m + d j 1) log (n + m + d j) (n + d j 1) log (n + d j) 2 2 2 2 m 2 2 2 12(n + m + d j) 12(n + d j) 3 3 2 2 + +O (n ) m,d 3 3 360(n + m + d j) 360(n + d j) 2 2 m n m(2 + 2d 2j + m) m 2 + 3d + 3j 3j(2 + m) = log + 2 2 4n 12n 3m + m + d(6 6j + 3m) 3 3 2 2 2 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m + +O (n ). m,d 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 24n (see, e.g., (Ref. [25], p. 257)) and md n 2 md n 4md 12md 32md log + log = + + +O (n ), m,d 2 3 2 2 2 2 4n 12n 24n and 1 1 1 2 3 4 5 log(1 + y) = y y + y y +O (y ), jyj < h < 1, 2 3 4 AppliedMath 2022, 2 452 in the above equation, we obtain md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d d 2 2 m(2 + 2d 2j + m) m 10 + 3d + 3j 3j(2 + m) å å 3m + m + d(6 6j + 3m) 4n 12n j=1 j=1 3 3 2 2 2 m 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m 2 2 2 24n 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) j=1 d d 1 (n + m + d 1) l + d p å å 2 2 n 2 j=1 j=1 ! ! 4 6 d d d d (n + m + d 1) l (n + m + d 1) l j j p p å å 4 6 n 2 n 2 j=1 j=1 ! ! d 1 + max jd j l 1jd l (n + m + d 1) j j + p +O . (8) å d,m,h 8 n n 2 j=1 Now, 1 n + m + d 1 (m + d + 1) (m + d + 1) 2(m + d + 1) = +O (n ), m,d 2 3 2 2(n 2) 2n n n n + m + d 1 1 (m + d + 3) (m + d + 2) = + + +O (n ), m,d 2 2 3 4(n 2) 4n 4n n n + m + d 1 1 (m + d + 5) = +O (n ), m,d 3 2 3 6(n 2) 6n 6n n + m + d 1 1 = +O (n ), m,d 4 3 8(n 2) 8n so we can rewrite (8) as md/2 [n/(n 2)] K (X) n,S,W log g (X/ n/(n 2)) S,W 1 (m + d + 1) m(2 + 2d 2j + m) 1 4 2 = n d d + l l j j 4 2 4 j=1 8 9 (m+d+3) 6 4 2 > > d + d (m + d + 1)d d < = 6 l 4 l l j j j 2 2 + n 10 + 3d + 3j 3j(2 + m) > > : ; j=1 12 2 3m + m + d(6 6j + 3m) 8 9 (m+d+5) 1 8 6 4 2 > > d d + (m + d + 2)d 2(m + d + 1)d d < = 8 6 l l l l j j j j 3 3 2 2 2 + n 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m > > : + ; j=1 24 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 1 + max jd j 1jd l +O , d,m,h which proves (3) after some simplifications with Mathematica. AppliedMath 2022, 2 453 Proof of Theorem 2. By the comparison of the total variation normkk with the Hellinger distance on page 726 of Carter [26], we already know that kP Q k n,S,W S,W " ! # u (9) dP n,S,W t c 2P X 2 B (1/2) + E log (X) 1 . fX2B (1/2)g n,S,W n,S,W dQ S,W 1/2 1/2 Given that D = S XW  T (n, I , I ) by Theorem 4.3.5 in [1], we know, by X m d,m d Theorem 4.2.1 in [1], that law 1 1/2 D = (n S) Z, for S  Wishart (n + d 1, I ) and Z  MN (0 , I I ) that are independent, so dd d dm dm d m that, by Theorems 3.3.1 and 3.3.3 in [1], we have > 1 D D jS  Wishart (m, nS ). (10) X X dd Therefore, by conditioning on S, and then by applying the sub-multiplicativity of the largest eigenvalue for nonnegative definite matrices, and a large deviation bound on the maximum eigenvalue of a Wishart matrix (which is sub-exponential), we get, for n large enough, " !# 1/2 c > P X 2 B (1/2)  E P l (D D ) > S 1 X n,S,W X " !# 1/2 1 1/2 > 1 1/2 E P l ((n S) )l (ZZ )l ((n S) ) > S 1 1 1 l (S) > d = E P l (ZZ ) > S 1/2 4 n 1/2 C exp , (11) m,d 10 md for some positive constant C that depends only on m and d. By Theorem 1, we also have m,d " ! # dP n,S,W E log (X) 1 fX2B (1/2)g n,S,W dQ S,W ( ) 1 > 2 E tr (D D ) 1 4 X = n (m+d+1) md(m+d+1) E tr(D D ) + 2 X 4 8  h i 9 (12) > 2 < = O E tr (D D ) 1 X fX2B (1/2)g n,S,W h i + n : ; +(m + d)O E tr(D D )1 +O(m(m + d)) X fX2B (1/2)g n,S,W ( ) > 3 > 2 O E tr (D D ) + (m + d)O E tr (D D ) X X X X + n . > 2 +(m + d)O E tr(D D ) +O(md(m + d) ) On the right-hand side, the first line is estimated using Lemma A1, and the second line is bounded using Lemma A2. We find " ! # dP n,S,W 3 3 2 E log (X) 1 = O(m d n ). fX2B (1/2)g n,S,W dQ S,W Putting (11) and (12) together in (9) gives the conclusion. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/appliedmath2030025/s1. AppliedMath 2022, 2 454 Funding: F.O. is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The R code for the simulations in Section 2 is in Supplementary Mate- rial. Acknowledgments: We thank the three referees for their comments. Conflicts of Interest: The author declares no conflicts of interest. Appendix A. Technical Computations Below, we compute the expectations for some traces of powers of the matrix-variate Student distribution. The lemma is used to estimate some trace moments and the  n errors in (12) of the proof of Theorem 2, and also as a preliminary result for the proof of Lemma A2. d m Lemma A1. Let d, m 2 N, S 2 S and W 2 S be given. If X  T (n, S, W) according ++ ++ d,m to (1), then h  i md n E tr D D = , (A1) n 2 h  i md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) = , (A2) (n 1)(n 2)(n 4) 1/2 1/2 where we recall D := S XW . In particular, as n ! ¥, we have h  i h  i > > 2 E tr D D  md and E tr (D D )  md(m + d + 1). X X X X Proof of Lemma A1. For W  Wishart (n,V) with n > 0 and V 2 S , we know from dd ++ (Ref. [1], p. 99) (alternatively, see (Ref. [27], p. 66) or (Ref. [28], p. 308)) that E[W] = nV and E[W ] = nf(n + 1)V + tr(V) I gV, and from (Ref. [1], pp. 99–100) (alternatively, see [29] and ([28], p. 308), or ([30], pp. 101–103)) that E[W ] = , for n d 1 > 0, n d 1 1 1 2 tr(V )V + (n d 1)V E[W ] = , for n d 3 > 0, (n d)(n d 1)(n d 3) and from (Corollary 3.1 in [30]) that 1 1 2 (n d 2) tr(V )V + 2V 1 1 E[tr(W )W ] = , for n d 3 > 0. (n d)(n d 1)(n d 3) Therefore, by combining the above moment estimates with (10), we have h i h i m n > > 1 1 E D D = E E[D D jS] = E[m (nS )] = m nE[S ] = I , X X d X X n 2 h i h i h n o i > 2 > 2 1 1 1 E (D D ) = E E[(D D ) jS] = E m (m + 1) (nS ) + tr(nS ) I (nS ) X X d X X n o 2 2 1 1 = m n (m + 1)E[S ] + E[tr(S )S ] AppliedMath 2022, 2 455 m n f(m + 1) (n + d 2) + (n 3) d + 2g = I , (n 1)(n 2)(n 4) By linearity, the trace of an expectation is the expectation of the trace, so (A1) and (A2) follow from the above equations. We can also estimate the moments of Lemma A1 on various events. The lemma below is used to estimate the  n errors in (12) of the proof of Theorem 2. d m dm Lemma A2. Let d, m 2 N, S 2 S and W 2 S be given, and let A 2 B(R ) be a Borel ++ ++ set. If X  T (n, S, W) according to (1), then, for n large enough, d,m h i 1/2 > 3/2 c E tr(D D )1  2 md (P(X 2 A )) , (A3) X fX2Ag h   i 2 md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) 1 X fX2Ag (n 1)(n 2)(n 4) 2 5/2 c 1/2 100 m d (P(X 2 A )) , (A4) 1/2 1/2 where we recall D := S XW . Proof of Lemma A2. By Lemma A1, the Cauchy–Schwarz inequality and Jensen’s inequality, > 2 > 2 (tr(D D ))  d tr((D D ) ), X X X X we have h i h i > > E tr(D D )1 = E tr(D D )1 c X X X fX2Ag X fX2A g h i 1/2 1/2 > 2 c E (tr(D D )) (P(X 2 A )) h i 1/2 > 2 c 1/2 d E tr((D D ) ) P X 2 A ( ( )) 1/2 3/2 c 2 md (P(X 2 A )) , which proves (A3). Similarly, by Lemma A1, Holder ’s inequality and Jensen’s inequality, > 2 2 > 4 (tr((D D ) ))  d tr((D D ) ), X X X X we have, for n large enough, h i md n f(m + d)(n 2) + n + mdg > 2 E tr((D D ) )1 X fX2Ag (n 1)(n 2)(n 4) h i  h i 1/2 1/2 > 2 > 2 2 c = E tr((D D ) )1 c  E (tr((D D ) )) (P(X 2 A )) X X X fX2A g X h i 1/2 > 4 c 1/2 dE tr((D D ) ) (P(X 2 A )) 1/2 1/2 1/2 4 4 c 2 5/2 c d 10 (md) (P(X 2 A ))  100 m d (P(X 2 A )) , which proves (A4). This ends the proof. AppliedMath 2022, 2 456 References 1. Gupta, A.K.; Nagar, D.K. Matrix Variate Distributions, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1999; p. 384. 2. Olver, F.W.J.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. (Eds.) NIST Handbook of Mathematical Functions; U.S. Department of Commerce, National Institute of Standards and Technology: Washington, DC, USA; Cambridge University Press: Cambridge, UK, 2010; p. xvi+951. 3. Nagar, D.K.; Roldán-Correa, A.; Gupta, A.K. Extended matrix variate gamma and beta functions. J. Multivar. Anal. 2013, 122, 53–69. [CrossRef] 4. Pajevic, S.; Basser, P.J. Parametric description of noise in diffusion tensor MRI. In Proceedings of the 7th Annual Meeting of the ISMRM, Philadelphia, PA, USA, 22–28 May 1999; p. 1787. 5. Basser, P.J.; Jones, D.K. Diffusion-tensor MRI: Theory, experimental design and data analysis—A technical review. NMR Biomed. 2002, 15, 456–467. [CrossRef] [PubMed] 6. Pajevic, S.; Basser, P.J. Parametric and non-parametric statistical analysis of DT-MRI data. J. Magn. Reson. 2003, 161, 1–14. [CrossRef] 7. Basser, P.J.; Pajevic, S. A normal distribution for tensor-valued random variables: Applications to diffusion tensor MRI. IEEE Trans. Med. Imaging 2003, 22, 785–794. [CrossRef] [PubMed] 8. Gasbarra, D.; Pajevic, S.; Basser, P.J. Eigenvalues of random matrices with isotropic Gaussian noise and the design of diffusion tensor imaging experiments. SIAM J. Imaging Sci. 2017, 10, 1511–1548. [CrossRef] [PubMed] 9. Alexander, D.C.; Pierpaoli, C.; Basser, P.J.; Gee, J.C. Spatial transformations of diffusion tensor magnetic resonance images. IEEE Trans. Med. Imaging 2001, 20, 1131–1139. [CrossRef] [PubMed] 10. Schwartzman, A.; Mascarenhas, W.F.; Taylor, J.E. Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices. Ann. Statist. 2008, 36, 2886–2919. [CrossRef] 11. Mallows, C.L. Latent vectors of random symmetric matrices. Biometrika 1961, 48, 133–149. [CrossRef] 12. Hu, W.; White, M. A CMB polarization primer. New Astron. 1997, 2, 323–344. [CrossRef] 13. Vafaei Sadr, A.; Movahed, S.M.S. Clustering of local extrema in Planck CMB maps. MNRAS 2021, 503, 815–829. [CrossRef] 14. Gallaugher, M.P.B.; McNicholas, P.D. Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 2018, 80, 83–93. [CrossRef] 15. Ouimet, F. Refined normal approximations for the Student distribution. J. Classical Anal. 2022, 20, 23–33. [CrossRef] 16. Shafiei, A.; Saberali, S.M. A simple asymptotic bound on the error of the ordinary normal approximation to the Student’s t-distribution. IEEE Commun. Lett. 2015, 19, 1295–1298. [CrossRef] 17. Govindarajulu, Z. Normal approximations to the classical discrete distributions. Sankhya¯ Ser. A 1965, 27, 143–172. 18. Esseen, C.G. Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math. 1945, 77, 1–125. [CrossRef] 19. Cressie, N. A finely tuned continuity correction. Ann. Inst. Statist. Math. 1978, 30, 435–442. [CrossRef] 20. Gaunt, R.E. Variance-gamma approximation via Stein’s method. Electron. J. Probab. 2014, 19, 1–33. [CrossRef] 21. Gaunt, R.E. New error bounds for Laplace approximation via Stein’s method. ESAIM Probab. Stat. 2021, 25, 325–345. [CrossRef] 22. Gaunt, R.E. Wasserstein and Kolmogorov error bounds for variance-gamma approximation via Stein’s method I. J. Theoret. Probab. 2020, 33, 465–505. [CrossRef] 23. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 24. Hotelling, H. The generalization of Student’s ratio. Ann. Math. Statist. 1931, 2, 360–378. [CrossRef] 25. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; National Bureau of Standards Applied Mathematics Series, For sale by the Superintendent of Documents; U.S. Government Printing Office: Washington, DC, USA, 1964; Volume 55, p. xiv+1046. 26. Carter, A.V. Deficiency distance between multinomial and multivariate normal experiments. Ann. Statist. 2002, 30, 708–730. [CrossRef] 27. de Waal, D.J.; Nel, D.G. On some expectations with respect to Wishart matrices. South African Statist. J. 1973, 7, 61–67. 28. Letac, G.; Massam, H. All invariant moments of the Wishart distribution. Scand. J. Statist. 2004, 31, 295–318. [CrossRef] 29. Haff, L.R. An identity for the Wishart distribution with applications. J. Multivar. Anal. 1979, 9, 531–544. [CrossRef] 30. von Rosen, D. Moments for the inverted Wishart distribution. Scand. J. Statist. 1988, 15, 97–109. [CrossRef] http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png AppliedMath Multidisciplinary Digital Publishing Institute

Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution and Its Application to Hotelling&rsquo;s T Statistic

AppliedMath , Volume 2 (3) – Aug 1, 2022

Loading next page...
 
/lp/multidisciplinary-digital-publishing-institute/local-normal-approximations-and-probability-metric-bounds-for-the-10sjwG5S0J

References (34)

Publisher
Multidisciplinary Digital Publishing Institute
Copyright
© 1996-2022 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Terms and Conditions Privacy Policy
ISSN
2673-9909
DOI
10.3390/appliedmath2030025
Publisher site
See Article on Publisher Site

Abstract

Article Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution and Its Application to Hotelling’s T Statistic 1,2 Frédéric Ouimet Department of Mathematics and Statistics, McGill University, Montreal, QC H3A 0B9, Canada; frederic.ouimet2@mcgill.ca Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125, USA Abstract: In this paper, we develop local expansions for the ratio of the centered matrix-variate T density to the centered matrix-variate normal density with the same covariances. The approximations are used to derive upper bounds on several probability metrics (such as the total variation and Hellinger distance) between the corresponding induced measures. This work extends some previous results for the univariate Student distribution to the matrix-variate setting. Keywords: asymptotic statistics; expansion; Hotelling’s T-squared statistic; Hotelling’s T statis- tic; matrix-variate normal distribution; local approximation; matrix-variate T distribution; normal approximation; Student distribution; T distribution; total variation MSC: Primary: 62E20; Secondary: 60F99 Citation: Ouimet, F. Local Normal 1. Introduction Approximations and Probability Metric Bounds for the Matrix-Variate For any n 2 N, define the space of (real symmetric) positive definite matrices of size T Distribution and Its Application to n n as follows: Hotelling’s T Statistic. AppliedMath n nn 2022, 2, 446–456. https://doi.org/ : S = M 2 R : M is symmetric and positive definite . ++ 10.3390/appliedmath2030025 dm d m For d, m 2 N, n > 0, M 2 R , S 2 S and W 2 S , the density function of the ++ ++ Academic Editor: Tommi Sottinen centered (and normalized) matrix-variate T distribution, hereafter denoted by T (n, S, W), d,m dm Received: 22 June 2022 is defined, for all X 2 R , by Accepted: 21 July 2022 1 1 1 > (n+m+d1)/2 Published: 1 August 2022 G ( (n + m + d 1)) jI + n S XW X j K (X) = , (1) n,S,W 1 md/2 m/2 d/2 (np) jSj jWj G ( (n + d 1)) Publisher’s Note: MDPI stays neutral d with regard to jurisdictional claims in (see, e.g., (Definition 4.2.1 in [1])) where n is the number of degrees of freedom, and published maps and institutional affil- iations. z(d+1)/2 G (z) = jSj exp(tr(S))dS S2S ++ j 1 d 1 d(d1)/4 = p G z , <(z) > , Copyright: © 2022 by the authors. 2 2 j=1 Licensee MDPI, Basel, Switzerland. This article is an open access article denotes the multivariate gamma function—see, e.g., (Section 35.3 in [2]) and [3]—and distributed under the terms and conditions of the Creative Commons z1 t G(z) = t e dt, <(z) > 0, Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). AppliedMath 2022, 2, 446–456. https://doi.org/10.3390/appliedmath2030025 https://www.mdpi.com/journal/appliedmath AppliedMath 2022, 2 447 is the classical gamma function. The mean and covariance matrix for the vectorization of T  T (n, S, W), namely d,m vec(T) := (T , T , . . . , T , T , T , . . . , T , . . . , T , T , . . . , T ) , 11 21 d1 12 22 d2 1m 2m dm (vec() is the operator that stacks the columns of a matrix on top of each other) are known to be (see, e.g., Theorem 4.3.1 in [1], but be careful of the normalization): E[vec(T)] = 0 (i.e., E[T] = 0 ), dm dm and Var(vec(T )) = S W, n > 2. (n 2) The first goal of our paper (Theorem 1) is to establish an asymptotic expansion for the ratio of the centered matrix-variate T density (1) to the centered matrix-variate normal (MN) density with the same covariances. According to (Gupta and Nagar [1], Theorem 2.2.1), the density of the MN (0 , S W) distribution is d,m dm 1 1 > exp tr S XW X dm g (X) = , X 2 R . (2) S,W md/2 m/2 d/2 (2p) jSj jWj The second goal of our paper (Theorem 2) is to apply the log-ratio expansion from Theorem 1 to derive upper bounds on multiple probability metrics between the measures induced by the centered matrix-variate T distribution and the corresponding centered matrix-variate normal distribution. In the special case m = 1, this gives us probability metric upper bounds between the measure induced by Hotelling’s T statistic and the associated matrix-normal measure. To give some practical motivations for the MN distribution (2), note that noise in the estimate of individual voxels of diffusion tensor magnetic resonance imaging (DT-MRI) data has been shown to be well modeled by a symmetric form of the MN distribution 33 in [4–6]. The symmetric MN voxel distributions were combined into a tensor-variate normal distribution in [7,8], which could help to predict how the whole image (not just individual voxels) changes when shearing and dilation operations are applied in image wearing and registration problems; see Alexander et al. [9]. In [10], maximum likelihood estimators and likelihood ratio tests are developed for the eigenvalues and eigenvectors of a form of the symmetric MN distribution with an orthogonally invariant covariance structure, both in one-sample problems (for example, in image interpolation) and two-sample problems (when comparing images) and under a broad variety of assumptions. This work extended significantly the previous results of Mallows [11]. In [10], it is also mentioned that the polarization pattern of cosmic microwave background (CMB) radiation measurements can be represented by 2 2 positive definite matrices; see the primer by Hu and White [12]. In a very recent and interesting paper, Vafaei Sadr and Movahed [13] presented evidence for the Gaussianity of the local extrema of CMB maps. We can also mention [14], where finite mixtures of skewed MN distributions were applied to an image recognition problem. In general, we know that the Gaussian distribution is an attractor for sums of i.i.d. random variables with finite variance, which makes many estimators in statistics asymptoti- cally normal. Similarly, we expect the MN distribution (2) to be an attractor for sums of i.i.d. random matrices with finite variances (Hotelling’s T-squared statistic is the most natural example), thus including many estimators, such as sample covariance matrices and score statistics for matrix parameters. In particular, if a given statistic or estimator is a function of the components of a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population, then we could study its large sample properties (such as its moments) using Theorem 1 (for example, by turning a Student-moments estimation problem into a Gaussian-moments estimation problem). AppliedMath 2022, 2 448 The following is a brief outline of the paper. Our main results are stated in Section 2 and proven in Section 3. Technical moment calculations are gathered in Appendix A. Notation 1. Throughout the paper, a = O(b) means that lim supja/bj < C as n ! ¥, where C > 0 is a universal constant. Whenever C might depend on some parameter, we add a subscript (for example, a = O (b)). Similarly, a = o(b) means that limja/bj = 0, and subscripts indicate which parameters the convergence rate can depend on. If a = (1 + o(1))b, then we write a  b. The notation tr() will denote the trace operator for matrices andjj their determinant. For a matrix dd M 2 R that is diagonalizable, l (M)    l (M) will denote its eigenvalues, and we let 1 d l(M) := (l (M), . . . , l (M)) . 1 d 2. Main Results In Theorem 1 below, we prove an asymptotic expansion for the ratio of the centered matrix-variate T density to the centered matrix-variate normal (MN) density with the same covariances. The case d = m = 1 was proven recently in [15] (see also [16] for an earlier rougher version). The result extends significantly the convergence in distribution result from Theorem 4.3.4 in [1]. d m Theorem 1. Let d, m 2 N, S 2 S and W 2 S be given. Pick any h 2 (0, 1) and let ++ ++ ( ) dm 1/4 : p B (h) = X 2 R : max  h n n,S,W 1jd n 2 denote the bulk of the centered matrix-variate T distribution, where n 2 1/2 1/2 : : D = S XW and d = l (D D ), 1  j  d. X X l j j X Then, as n ! ¥ and uniformly for X 2 B (h), we have n,S,W md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W ( ) (m+d+1) 1 > 2 > tr (D D ) tr D D 1 X X X X 4 2 = n md(m+d+1) ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 6 X 4 X + n md 2 2 (3) + (13 2d 3d(3 + m) + 9m 2m ) 8 9 (m+d1) 1 > 4 > 3 > > tr (D D ) tr (D D ) < X X = 8 X 6 X 3 2 + n 26 + d + 2d (3 + m) + 11m md > > : ; 2 3 2 6m + m + d(11 9m + 2m ) > 5 1 + tr (D D ) +O . d,m,h Local approximations such as the one in Theorem 1 can be found for the Poisson, binomial and negative binomial distributions in [17] (based on Fourier analysis results from [18]), and [19] for the binomial distribution. Another approach, using Stein’s method, is used to study the variance-gamma distribution in [20]. Moreover, Kolmogorov and Wasserstein distance bounds are derived in [21,22] for the Laplace and variance-gamma distributions. AppliedMath 2022, 2 449 Below, we provide numerical evidence (displayed graphically) for the validity of the expansion in Theorem 1 when d = m = 2. We compare three levels of approximation for various choices of S. For any given S 2 S , define ++ md/2 [n/(n 2)] K (X) n,S,W E := sup log p , (4) g (X/ n/(n 2)) 1/4 S,W X2B (n ) n,S,W md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + , X X X X 4 2 4 md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + X X X X 4 2 4 ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 X X 6 4 n . md 2 2 + (13 2d 3d(3 + m) + 9m 2m ) In the R software [23], we use Equation (7) to evaluate the log-ratios inside E , E and 0 1 E . 1/4 > k Note that X 2 B (n ) implies jtr((D D ) )j  d for all k 2 N, so we expect n,S,W X from Theorem 1 that the maximum errors above (E , E and E ) will have the asymptotic 0 1 2 behavior (1+i) E = O (n ), for all i 2 f0, 1, 2g, or, equivalently, log E lim inf  1 + i, for all i 2 f0, 1, 2g. (5) n!¥ log(n ) The property (5) is verified in Figure 1 below, for W = I and various choices of S . 2 22 Similarly, the corresponding log-log plots of the errors as a function of n are displayed in Figure 2. The simulations are limited to the range 5  n  1005. The R code that generated Figures 1 and 2 can be found at Supplementary Material. As a consequence of the previous theorem, we can derive asymptotic upper bounds on several probability metrics between the probability measures induced by the centered matrix-variate T distribution (1) and the corresponding centered matrix-variate normal distribution (2). The distance between Hotelling’s T statistic [24] and the corresponding matrix-variate normal distribution is obtained in the special case m = 1. d m Theorem 2 (Probability metric upper bounds). Let d, m 2 N, S 2 S and W 2 S be ++ ++ given. Assume that X  T (n, S, W), Y  MN (0 , S W), and let P and Q be d,m d,m dm n,S,W S,W the laws of X and Y n/(n 2), respectively. Then, as n ! ¥, 3/2 3/2 C(md) 2C(md) dist(P ,Q )  and H(P ,Q )  , n,S,W S,W n,S,W S,W n n where C > 0 is a universal constant, H(,) denotes the Hellinger distance, and dist(,) can be replaced by any of the following probability metrics: total variation, Kolmogorov (or uniform) metric, Lévy metric, discrepancy metric, Prokhorov metric. AppliedMath 2022, 2 450 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) −1 −1 −1 −1 ● ● ● ● log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) −1 −1 −1 −1 log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 1. Plots of log E / log(n ) as a function of n, for various choices of S. The plots confirm (5) for our choices of S and bring strong evidence for the validity of Theorem 1. 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E 1 E 1 E 1 E 0 0 0 0 ● 1 E ● 1 E ● 1 E ● 1 E 1 1 1 1 1 E 1 E 1 E 1 E 2 2 2 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 2. Plots of 1/E as a function of n, for various choices of S. Both the horizontal and vertical axes are on a logarithmic scale. The plots clearly illustrate how the addition of correction terms from Theorem 1 to the base approximation (4) improves it. 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 AppliedMath 2022, 2 451 3. Proofs Proof of Theorem 1. First, we take the expression in (1) over the one in (2): md/2 [n/(n 2)] K (X) n,S,W g (X/ n/(n 2)) S,W md/2 d 1 2 G( (n + m + d j)) (6) n 2 G( (n + d j)) j=1 (n+m+d1)/2 (n 2) > 1 > exp tr D D I + n D D . X d X X X 2n The last determinant was obtained using the fact that the eigenvalues of a product of rectangular matrices are invariant under cyclic permutations (as long as the products remain well defined). Indeed, for all j 2 f1, 2, . . . , dg, we have 1 1 1 > 1 1 1 > l (I + n S XW X ) = 1 + n l (S XW X ) j j 1 > 1 > = 1 + n l (D D ) = l (I + n D D ). j X j d X X X By taking the logarithm on both sides of (6), we get md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d h i md n 2 1 1 = log + log G( (n + m + d j)) log G( (n + d j)) å (7) 2 2 2 2 j=1 0 1 d d 1 (n + m + d 1) @ A + d log 1 + . å l å 2 2 n 2 j=1 j=1 By applying the Taylor expansions, 1 1 log G( (n + m + d j)) log G( (n + d j)) 2 2 1 1 1 1 = (n + m + d j 1) log (n + m + d j) (n + d j 1) log (n + d j) 2 2 2 2 m 2 2 2 12(n + m + d j) 12(n + d j) 3 3 2 2 + +O (n ) m,d 3 3 360(n + m + d j) 360(n + d j) 2 2 m n m(2 + 2d 2j + m) m 2 + 3d + 3j 3j(2 + m) = log + 2 2 4n 12n 3m + m + d(6 6j + 3m) 3 3 2 2 2 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m + +O (n ). m,d 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 24n (see, e.g., (Ref. [25], p. 257)) and md n 2 md n 4md 12md 32md log + log = + + +O (n ), m,d 2 3 2 2 2 2 4n 12n 24n and 1 1 1 2 3 4 5 log(1 + y) = y y + y y +O (y ), jyj < h < 1, 2 3 4 AppliedMath 2022, 2 452 in the above equation, we obtain md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d d 2 2 m(2 + 2d 2j + m) m 10 + 3d + 3j 3j(2 + m) å å 3m + m + d(6 6j + 3m) 4n 12n j=1 j=1 3 3 2 2 2 m 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m 2 2 2 24n 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) j=1 d d 1 (n + m + d 1) l + d p å å 2 2 n 2 j=1 j=1 ! ! 4 6 d d d d (n + m + d 1) l (n + m + d 1) l j j p p å å 4 6 n 2 n 2 j=1 j=1 ! ! d 1 + max jd j l 1jd l (n + m + d 1) j j + p +O . (8) å d,m,h 8 n n 2 j=1 Now, 1 n + m + d 1 (m + d + 1) (m + d + 1) 2(m + d + 1) = +O (n ), m,d 2 3 2 2(n 2) 2n n n n + m + d 1 1 (m + d + 3) (m + d + 2) = + + +O (n ), m,d 2 2 3 4(n 2) 4n 4n n n + m + d 1 1 (m + d + 5) = +O (n ), m,d 3 2 3 6(n 2) 6n 6n n + m + d 1 1 = +O (n ), m,d 4 3 8(n 2) 8n so we can rewrite (8) as md/2 [n/(n 2)] K (X) n,S,W log g (X/ n/(n 2)) S,W 1 (m + d + 1) m(2 + 2d 2j + m) 1 4 2 = n d d + l l j j 4 2 4 j=1 8 9 (m+d+3) 6 4 2 > > d + d (m + d + 1)d d < = 6 l 4 l l j j j 2 2 + n 10 + 3d + 3j 3j(2 + m) > > : ; j=1 12 2 3m + m + d(6 6j + 3m) 8 9 (m+d+5) 1 8 6 4 2 > > d d + (m + d + 2)d 2(m + d + 1)d d < = 8 6 l l l l j j j j 3 3 2 2 2 + n 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m > > : + ; j=1 24 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 1 + max jd j 1jd l +O , d,m,h which proves (3) after some simplifications with Mathematica. AppliedMath 2022, 2 453 Proof of Theorem 2. By the comparison of the total variation normkk with the Hellinger distance on page 726 of Carter [26], we already know that kP Q k n,S,W S,W " ! # u (9) dP n,S,W t c 2P X 2 B (1/2) + E log (X) 1 . fX2B (1/2)g n,S,W n,S,W dQ S,W 1/2 1/2 Given that D = S XW  T (n, I , I ) by Theorem 4.3.5 in [1], we know, by X m d,m d Theorem 4.2.1 in [1], that law 1 1/2 D = (n S) Z, for S  Wishart (n + d 1, I ) and Z  MN (0 , I I ) that are independent, so dd d dm dm d m that, by Theorems 3.3.1 and 3.3.3 in [1], we have > 1 D D jS  Wishart (m, nS ). (10) X X dd Therefore, by conditioning on S, and then by applying the sub-multiplicativity of the largest eigenvalue for nonnegative definite matrices, and a large deviation bound on the maximum eigenvalue of a Wishart matrix (which is sub-exponential), we get, for n large enough, " !# 1/2 c > P X 2 B (1/2)  E P l (D D ) > S 1 X n,S,W X " !# 1/2 1 1/2 > 1 1/2 E P l ((n S) )l (ZZ )l ((n S) ) > S 1 1 1 l (S) > d = E P l (ZZ ) > S 1/2 4 n 1/2 C exp , (11) m,d 10 md for some positive constant C that depends only on m and d. By Theorem 1, we also have m,d " ! # dP n,S,W E log (X) 1 fX2B (1/2)g n,S,W dQ S,W ( ) 1 > 2 E tr (D D ) 1 4 X = n (m+d+1) md(m+d+1) E tr(D D ) + 2 X 4 8  h i 9 (12) > 2 < = O E tr (D D ) 1 X fX2B (1/2)g n,S,W h i + n : ; +(m + d)O E tr(D D )1 +O(m(m + d)) X fX2B (1/2)g n,S,W ( ) > 3 > 2 O E tr (D D ) + (m + d)O E tr (D D ) X X X X + n . > 2 +(m + d)O E tr(D D ) +O(md(m + d) ) On the right-hand side, the first line is estimated using Lemma A1, and the second line is bounded using Lemma A2. We find " ! # dP n,S,W 3 3 2 E log (X) 1 = O(m d n ). fX2B (1/2)g n,S,W dQ S,W Putting (11) and (12) together in (9) gives the conclusion. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/appliedmath2030025/s1. AppliedMath 2022, 2 454 Funding: F.O. is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The R code for the simulations in Section 2 is in Supplementary Mate- rial. Acknowledgments: We thank the three referees for their comments. Conflicts of Interest: The author declares no conflicts of interest. Appendix A. Technical Computations Below, we compute the expectations for some traces of powers of the matrix-variate Student distribution. The lemma is used to estimate some trace moments and the  n errors in (12) of the proof of Theorem 2, and also as a preliminary result for the proof of Lemma A2. d m Lemma A1. Let d, m 2 N, S 2 S and W 2 S be given. If X  T (n, S, W) according ++ ++ d,m to (1), then h  i md n E tr D D = , (A1) n 2 h  i md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) = , (A2) (n 1)(n 2)(n 4) 1/2 1/2 where we recall D := S XW . In particular, as n ! ¥, we have h  i h  i > > 2 E tr D D  md and E tr (D D )  md(m + d + 1). X X X X Proof of Lemma A1. For W  Wishart (n,V) with n > 0 and V 2 S , we know from dd ++ (Ref. [1], p. 99) (alternatively, see (Ref. [27], p. 66) or (Ref. [28], p. 308)) that E[W] = nV and E[W ] = nf(n + 1)V + tr(V) I gV, and from (Ref. [1], pp. 99–100) (alternatively, see [29] and ([28], p. 308), or ([30], pp. 101–103)) that E[W ] = , for n d 1 > 0, n d 1 1 1 2 tr(V )V + (n d 1)V E[W ] = , for n d 3 > 0, (n d)(n d 1)(n d 3) and from (Corollary 3.1 in [30]) that 1 1 2 (n d 2) tr(V )V + 2V 1 1 E[tr(W )W ] = , for n d 3 > 0. (n d)(n d 1)(n d 3) Therefore, by combining the above moment estimates with (10), we have h i h i m n > > 1 1 E D D = E E[D D jS] = E[m (nS )] = m nE[S ] = I , X X d X X n 2 h i h i h n o i > 2 > 2 1 1 1 E (D D ) = E E[(D D ) jS] = E m (m + 1) (nS ) + tr(nS ) I (nS ) X X d X X n o 2 2 1 1 = m n (m + 1)E[S ] + E[tr(S )S ] AppliedMath 2022, 2 455 m n f(m + 1) (n + d 2) + (n 3) d + 2g = I , (n 1)(n 2)(n 4) By linearity, the trace of an expectation is the expectation of the trace, so (A1) and (A2) follow from the above equations. We can also estimate the moments of Lemma A1 on various events. The lemma below is used to estimate the  n errors in (12) of the proof of Theorem 2. d m dm Lemma A2. Let d, m 2 N, S 2 S and W 2 S be given, and let A 2 B(R ) be a Borel ++ ++ set. If X  T (n, S, W) according to (1), then, for n large enough, d,m h i 1/2 > 3/2 c E tr(D D )1  2 md (P(X 2 A )) , (A3) X fX2Ag h   i 2 md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) 1 X fX2Ag (n 1)(n 2)(n 4) 2 5/2 c 1/2 100 m d (P(X 2 A )) , (A4) 1/2 1/2 where we recall D := S XW . Proof of Lemma A2. By Lemma A1, the Cauchy–Schwarz inequality and Jensen’s inequality, > 2 > 2 (tr(D D ))  d tr((D D ) ), X X X X we have h i h i > > E tr(D D )1 = E tr(D D )1 c X X X fX2Ag X fX2A g h i 1/2 1/2 > 2 c E (tr(D D )) (P(X 2 A )) h i 1/2 > 2 c 1/2 d E tr((D D ) ) P X 2 A ( ( )) 1/2 3/2 c 2 md (P(X 2 A )) , which proves (A3). Similarly, by Lemma A1, Holder ’s inequality and Jensen’s inequality, > 2 2 > 4 (tr((D D ) ))  d tr((D D ) ), X X X X we have, for n large enough, h i md n f(m + d)(n 2) + n + mdg > 2 E tr((D D ) )1 X fX2Ag (n 1)(n 2)(n 4) h i  h i 1/2 1/2 > 2 > 2 2 c = E tr((D D ) )1 c  E (tr((D D ) )) (P(X 2 A )) X X X fX2A g X h i 1/2 > 4 c 1/2 dE tr((D D ) ) (P(X 2 A )) 1/2 1/2 1/2 4 4 c 2 5/2 c d 10 (md) (P(X 2 A ))  100 m d (P(X 2 A )) , which proves (A4). This ends the proof. AppliedMath 2022, 2 456 References 1. Gupta, A.K.; Nagar, D.K. Matrix Variate Distributions, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1999; p. 384. 2. Olver, F.W.J.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. (Eds.) NIST Handbook of Mathematical Functions; U.S. Department of Commerce, National Institute of Standards and Technology: Washington, DC, USA; Cambridge University Press: Cambridge, UK, 2010; p. xvi+951. 3. Nagar, D.K.; Roldán-Correa, A.; Gupta, A.K. Extended matrix variate gamma and beta functions. J. Multivar. Anal. 2013, 122, 53–69. [CrossRef] 4. Pajevic, S.; Basser, P.J. Parametric description of noise in diffusion tensor MRI. In Proceedings of the 7th Annual Meeting of the ISMRM, Philadelphia, PA, USA, 22–28 May 1999; p. 1787. 5. Basser, P.J.; Jones, D.K. Diffusion-tensor MRI: Theory, experimental design and data analysis—A technical review. NMR Biomed. 2002, 15, 456–467. [CrossRef] [PubMed] 6. Pajevic, S.; Basser, P.J. Parametric and non-parametric statistical analysis of DT-MRI data. J. Magn. Reson. 2003, 161, 1–14. [CrossRef] 7. Basser, P.J.; Pajevic, S. A normal distribution for tensor-valued random variables: Applications to diffusion tensor MRI. IEEE Trans. Med. Imaging 2003, 22, 785–794. [CrossRef] [PubMed] 8. Gasbarra, D.; Pajevic, S.; Basser, P.J. Eigenvalues of random matrices with isotropic Gaussian noise and the design of diffusion tensor imaging experiments. SIAM J. Imaging Sci. 2017, 10, 1511–1548. [CrossRef] [PubMed] 9. Alexander, D.C.; Pierpaoli, C.; Basser, P.J.; Gee, J.C. Spatial transformations of diffusion tensor magnetic resonance images. IEEE Trans. Med. Imaging 2001, 20, 1131–1139. [CrossRef] [PubMed] 10. Schwartzman, A.; Mascarenhas, W.F.; Taylor, J.E. Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices. Ann. Statist. 2008, 36, 2886–2919. [CrossRef] 11. Mallows, C.L. Latent vectors of random symmetric matrices. Biometrika 1961, 48, 133–149. [CrossRef] 12. Hu, W.; White, M. A CMB polarization primer. New Astron. 1997, 2, 323–344. [CrossRef] 13. Vafaei Sadr, A.; Movahed, S.M.S. Clustering of local extrema in Planck CMB maps. MNRAS 2021, 503, 815–829. [CrossRef] 14. Gallaugher, M.P.B.; McNicholas, P.D. Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 2018, 80, 83–93. [CrossRef] 15. Ouimet, F. Refined normal approximations for the Student distribution. J. Classical Anal. 2022, 20, 23–33. [CrossRef] 16. Shafiei, A.; Saberali, S.M. A simple asymptotic bound on the error of the ordinary normal approximation to the Student’s t-distribution. IEEE Commun. Lett. 2015, 19, 1295–1298. [CrossRef] 17. Govindarajulu, Z. Normal approximations to the classical discrete distributions. Sankhya¯ Ser. A 1965, 27, 143–172. 18. Esseen, C.G. Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math. 1945, 77, 1–125. [CrossRef] 19. Cressie, N. A finely tuned continuity correction. Ann. Inst. Statist. Math. 1978, 30, 435–442. [CrossRef] 20. Gaunt, R.E. Variance-gamma approximation via Stein’s method. Electron. J. Probab. 2014, 19, 1–33. [CrossRef] 21. Gaunt, R.E. New error bounds for Laplace approximation via Stein’s method. ESAIM Probab. Stat. 2021, 25, 325–345. [CrossRef] 22. Gaunt, R.E. Wasserstein and Kolmogorov error bounds for variance-gamma approximation via Stein’s method I. J. Theoret. Probab. 2020, 33, 465–505. [CrossRef] 23. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 24. Hotelling, H. The generalization of Student’s ratio. Ann. Math. Statist. 1931, 2, 360–378. [CrossRef] 25. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; National Bureau of Standards Applied Mathematics Series, For sale by the Superintendent of Documents; U.S. Government Printing Office: Washington, DC, USA, 1964; Volume 55, p. xiv+1046. 26. Carter, A.V. Deficiency distance between multinomial and multivariate normal experiments. Ann. Statist. 2002, 30, 708–730. [CrossRef] 27. de Waal, D.J.; Nel, D.G. On some expectations with respect to Wishart matrices. South African Statist. J. 1973, 7, 61–67. 28. Letac, G.; Massam, H. All invariant moments of the Wishart distribution. Scand. J. Statist. 2004, 31, 295–318. [CrossRef] 29. Haff, L.R. An identity for the Wishart distribution with applications. J. Multivar. Anal. 1979, 9, 531–544. [CrossRef] 30. von Rosen, D. Moments for the inverted Wishart distribution. Scand. J. Statist. 1988, 15, 97–109. [CrossRef]

Journal

AppliedMathMultidisciplinary Digital Publishing Institute

Published: Aug 1, 2022

Keywords: asymptotic statistics; expansion; Hotelling’s T-squared statistic; Hotelling’s T statistic; matrix-variate normal distribution; local approximation; matrix-variate T distribution; normal approximation; Student distribution; T distribution; total variation

There are no references for this article.