Access the full text.
Sign up today, get DeepDyve free for 14 days.
D. Alexander, C. Pierpaoli, P. Basser, J. Gee (2001)
Spatial transformations of diffusion tensor magnetic resonance imagesIEEE Transactions on Medical Imaging, 20
A. Schwartzman, W. Mascarenhas, Jonathan Taylor (2008)
Inference for eigenvalues and eigenvectors of Gaussian symmetric matricesAnnals of Statistics, 36
F. Olver, D. Lozier, R. Boisvert, Charles Clark (2010)
NIST Handbook of Mathematical Functions
(1999)
Parametric description of noise in diﬀusion tensor MRI
Dario Gasbarra, S. Pajevic, P. Basser (2017)
Eigenvalues of Random Matrices with Isotropic Gaussian Noise and the Design of Diffusion Tensor Imaging ExperimentsSIAM journal on imaging sciences, 10 3
L. Haff (1979)
An identity for the Wishart distribution with applicationsJournal of Multivariate Analysis, 9
A. Rukhin (1999)
Matrix Variate DistributionsJournal of the American Statistical Association, 98
D. Nagar, A. Roldán-Correa, Arjun Gupta (2013)
Extended matrix variate gamma and beta functionsJ. Multivar. Anal., 122
G. Letac, H. Massam (2004)
All Invariant Moments of the Wishart DistributionScandinavian Journal of Statistics, 31
Robert Gaunt (2013)
Variance-Gamma approximation via Stein's methodElectronic Journal of Probability, 19
Rosen Von (1997)
On moments of the inverted Wishart distributionStatistics, 30
Robert Gaunt (2019)
New error bounds for Laplace approximation via Stein’s methodESAIM: Probability and Statistics
M. Gallaugher, P. McNicholas (2017)
A matrix variate skew‐t distributionStat, 6
Gupta (1999)
384
(2020)
R Core Team
Robert Gaunt (2017)
Wasserstein and Kolmogorov Error Bounds for Variance-Gamma Approximation via Stein’s Method IJournal of Theoretical Probability, 33
Akbar Shafiei, S. Saberali (2015)
A Simple Asymptotic Bound on the Error of the Ordinary Normal Approximation to the Student's t-DistributionIEEE Communications Letters, 19
C. Mallows (1961)
Latent vectors of random symmetric matricesBiometrika, 48
W. White (1997)
A CMB polarization primerNew Astronomy, 2
R. Team (2014)
R: A language and environment for statistical computing.MSOR connections, 1
S. Pajevic, P. Basser (2003)
Parametric and non-parametric statistical analysis of DT-MRI data.Journal of magnetic resonance, 161 1
D. Owen (1965)
Handbook of Mathematical Functions with FormulasTechnometrics, 7
H. Hotelling (1931)
The Generalization of Student’s RatioAnnals of Mathematical Statistics, 2
C. Esseen (1945)
Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian lawActa Mathematica, 77
(1973)
On some expectations with respect to Wishart matrices
N. Cressie (1978)
A finely tuned continuity correctionAnnals of the Institute of Statistical Mathematics, 30
Frédéric Ouimet (2022)
Refined normal approximations for the Student distributionJournal of Classical Analysis
A. Sadr, S. Movahed (2020)
Clustering of local extrema in Planck CMB mapsarXiv: Cosmology and Nongalactic Astrophysics
M. Abramowitz, I. Stegun, David Miller (1965)
Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55)Journal of Applied Mechanics, 32
P. Basser, S. Pajevic (2003)
A normal distribution for tensor-valued random variables: applications to diffusion tensor MRIIEEE Transactions on Medical Imaging, 22
Andrew Carter (2002)
DEFICIENCY DISTANCE BETWEEN MULTINOMIAL AND MULTIVARIATE NORMAL EXPERIMENTSAnnals of Statistics, 30
(2010)
Eds.) NIST Handbook of Mathematical Functions; U.S. Department of Commerce, National Institute of Standards and Technology: Washington, DC, USA
P. Basser, Derek Jones (2002)
Diffusion‐tensor MRI: theory, experimental design and data analysis – a technical reviewNMR in Biomedicine, 15
Article Local Normal Approximations and Probability Metric Bounds for the Matrix-Variate T Distribution and Its Application to Hotelling’s T Statistic 1,2 Frédéric Ouimet Department of Mathematics and Statistics, McGill University, Montreal, QC H3A 0B9, Canada; frederic.ouimet2@mcgill.ca Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125, USA Abstract: In this paper, we develop local expansions for the ratio of the centered matrix-variate T density to the centered matrix-variate normal density with the same covariances. The approximations are used to derive upper bounds on several probability metrics (such as the total variation and Hellinger distance) between the corresponding induced measures. This work extends some previous results for the univariate Student distribution to the matrix-variate setting. Keywords: asymptotic statistics; expansion; Hotelling’s T-squared statistic; Hotelling’s T statis- tic; matrix-variate normal distribution; local approximation; matrix-variate T distribution; normal approximation; Student distribution; T distribution; total variation MSC: Primary: 62E20; Secondary: 60F99 Citation: Ouimet, F. Local Normal 1. Introduction Approximations and Probability Metric Bounds for the Matrix-Variate For any n 2 N, deﬁne the space of (real symmetric) positive deﬁnite matrices of size T Distribution and Its Application to n n as follows: Hotelling’s T Statistic. AppliedMath n nn 2022, 2, 446–456. https://doi.org/ : S = M 2 R : M is symmetric and positive deﬁnite . ++ 10.3390/appliedmath2030025 dm d m For d, m 2 N, n > 0, M 2 R , S 2 S and W 2 S , the density function of the ++ ++ Academic Editor: Tommi Sottinen centered (and normalized) matrix-variate T distribution, hereafter denoted by T (n, S, W), d,m dm Received: 22 June 2022 is deﬁned, for all X 2 R , by Accepted: 21 July 2022 1 1 1 > (n+m+d1)/2 Published: 1 August 2022 G ( (n + m + d 1)) jI + n S XW X j K (X) = , (1) n,S,W 1 md/2 m/2 d/2 (np) jSj jWj G ( (n + d 1)) Publisher’s Note: MDPI stays neutral d with regard to jurisdictional claims in (see, e.g., (Deﬁnition 4.2.1 in [1])) where n is the number of degrees of freedom, and published maps and institutional afﬁl- iations. z(d+1)/2 G (z) = jSj exp(tr(S))dS S2S ++ j 1 d 1 d(d1)/4 = p G z , <(z) > , Copyright: © 2022 by the authors. 2 2 j=1 Licensee MDPI, Basel, Switzerland. This article is an open access article denotes the multivariate gamma function—see, e.g., (Section 35.3 in [2]) and [3]—and distributed under the terms and conditions of the Creative Commons z1 t G(z) = t e dt, <(z) > 0, Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). AppliedMath 2022, 2, 446–456. https://doi.org/10.3390/appliedmath2030025 https://www.mdpi.com/journal/appliedmath AppliedMath 2022, 2 447 is the classical gamma function. The mean and covariance matrix for the vectorization of T T (n, S, W), namely d,m vec(T) := (T , T , . . . , T , T , T , . . . , T , . . . , T , T , . . . , T ) , 11 21 d1 12 22 d2 1m 2m dm (vec() is the operator that stacks the columns of a matrix on top of each other) are known to be (see, e.g., Theorem 4.3.1 in [1], but be careful of the normalization): E[vec(T)] = 0 (i.e., E[T] = 0 ), dm dm and Var(vec(T )) = S W, n > 2. (n 2) The ﬁrst goal of our paper (Theorem 1) is to establish an asymptotic expansion for the ratio of the centered matrix-variate T density (1) to the centered matrix-variate normal (MN) density with the same covariances. According to (Gupta and Nagar [1], Theorem 2.2.1), the density of the MN (0 , S W) distribution is d,m dm 1 1 > exp tr S XW X dm g (X) = , X 2 R . (2) S,W md/2 m/2 d/2 (2p) jSj jWj The second goal of our paper (Theorem 2) is to apply the log-ratio expansion from Theorem 1 to derive upper bounds on multiple probability metrics between the measures induced by the centered matrix-variate T distribution and the corresponding centered matrix-variate normal distribution. In the special case m = 1, this gives us probability metric upper bounds between the measure induced by Hotelling’s T statistic and the associated matrix-normal measure. To give some practical motivations for the MN distribution (2), note that noise in the estimate of individual voxels of diffusion tensor magnetic resonance imaging (DT-MRI) data has been shown to be well modeled by a symmetric form of the MN distribution 33 in [4–6]. The symmetric MN voxel distributions were combined into a tensor-variate normal distribution in [7,8], which could help to predict how the whole image (not just individual voxels) changes when shearing and dilation operations are applied in image wearing and registration problems; see Alexander et al. [9]. In [10], maximum likelihood estimators and likelihood ratio tests are developed for the eigenvalues and eigenvectors of a form of the symmetric MN distribution with an orthogonally invariant covariance structure, both in one-sample problems (for example, in image interpolation) and two-sample problems (when comparing images) and under a broad variety of assumptions. This work extended signiﬁcantly the previous results of Mallows [11]. In [10], it is also mentioned that the polarization pattern of cosmic microwave background (CMB) radiation measurements can be represented by 2 2 positive definite matrices; see the primer by Hu and White [12]. In a very recent and interesting paper, Vafaei Sadr and Movahed [13] presented evidence for the Gaussianity of the local extrema of CMB maps. We can also mention [14], where ﬁnite mixtures of skewed MN distributions were applied to an image recognition problem. In general, we know that the Gaussian distribution is an attractor for sums of i.i.d. random variables with ﬁnite variance, which makes many estimators in statistics asymptoti- cally normal. Similarly, we expect the MN distribution (2) to be an attractor for sums of i.i.d. random matrices with ﬁnite variances (Hotelling’s T-squared statistic is the most natural example), thus including many estimators, such as sample covariance matrices and score statistics for matrix parameters. In particular, if a given statistic or estimator is a function of the components of a sample covariance matrix for i.i.d. observations coming from a multivariate Gaussian population, then we could study its large sample properties (such as its moments) using Theorem 1 (for example, by turning a Student-moments estimation problem into a Gaussian-moments estimation problem). AppliedMath 2022, 2 448 The following is a brief outline of the paper. Our main results are stated in Section 2 and proven in Section 3. Technical moment calculations are gathered in Appendix A. Notation 1. Throughout the paper, a = O(b) means that lim supja/bj < C as n ! ¥, where C > 0 is a universal constant. Whenever C might depend on some parameter, we add a subscript (for example, a = O (b)). Similarly, a = o(b) means that limja/bj = 0, and subscripts indicate which parameters the convergence rate can depend on. If a = (1 + o(1))b, then we write a b. The notation tr() will denote the trace operator for matrices andjj their determinant. For a matrix dd M 2 R that is diagonalizable, l (M) l (M) will denote its eigenvalues, and we let 1 d l(M) := (l (M), . . . , l (M)) . 1 d 2. Main Results In Theorem 1 below, we prove an asymptotic expansion for the ratio of the centered matrix-variate T density to the centered matrix-variate normal (MN) density with the same covariances. The case d = m = 1 was proven recently in [15] (see also [16] for an earlier rougher version). The result extends signiﬁcantly the convergence in distribution result from Theorem 4.3.4 in [1]. d m Theorem 1. Let d, m 2 N, S 2 S and W 2 S be given. Pick any h 2 (0, 1) and let ++ ++ ( ) dm 1/4 : p B (h) = X 2 R : max h n n,S,W 1jd n 2 denote the bulk of the centered matrix-variate T distribution, where n 2 1/2 1/2 : : D = S XW and d = l (D D ), 1 j d. X X l j j X Then, as n ! ¥ and uniformly for X 2 B (h), we have n,S,W md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W ( ) (m+d+1) 1 > 2 > tr (D D ) tr D D 1 X X X X 4 2 = n md(m+d+1) ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 6 X 4 X + n md 2 2 (3) + (13 2d 3d(3 + m) + 9m 2m ) 8 9 (m+d1) 1 > 4 > 3 > > tr (D D ) tr (D D ) < X X = 8 X 6 X 3 2 + n 26 + d + 2d (3 + m) + 11m md > > : ; 2 3 2 6m + m + d(11 9m + 2m ) > 5 1 + tr (D D ) +O . d,m,h Local approximations such as the one in Theorem 1 can be found for the Poisson, binomial and negative binomial distributions in [17] (based on Fourier analysis results from [18]), and [19] for the binomial distribution. Another approach, using Stein’s method, is used to study the variance-gamma distribution in [20]. Moreover, Kolmogorov and Wasserstein distance bounds are derived in [21,22] for the Laplace and variance-gamma distributions. AppliedMath 2022, 2 449 Below, we provide numerical evidence (displayed graphically) for the validity of the expansion in Theorem 1 when d = m = 2. We compare three levels of approximation for various choices of S. For any given S 2 S , deﬁne ++ md/2 [n/(n 2)] K (X) n,S,W E := sup log p , (4) g (X/ n/(n 2)) 1/4 S,W X2B (n ) n,S,W md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + , X X X X 4 2 4 md/2 [n/(n 2)] K (X) n,S,W E := sup log p 1/4 g (X/ n/(n 2)) S,W X2B (n ) n,S,W 1 (m + d + 1) md(m + d + 1) 1 > 2 > n tr (D D ) tr D D + X X X X 4 2 4 ( ) (m+d1) 1 > 3 > 2 tr (D D ) + tr (D D ) X X 2 X X 6 4 n . md 2 2 + (13 2d 3d(3 + m) + 9m 2m ) In the R software [23], we use Equation (7) to evaluate the log-ratios inside E , E and 0 1 E . 1/4 > k Note that X 2 B (n ) implies jtr((D D ) )j d for all k 2 N, so we expect n,S,W X from Theorem 1 that the maximum errors above (E , E and E ) will have the asymptotic 0 1 2 behavior (1+i) E = O (n ), for all i 2 f0, 1, 2g, or, equivalently, log E lim inf 1 + i, for all i 2 f0, 1, 2g. (5) n!¥ log(n ) The property (5) is veriﬁed in Figure 1 below, for W = I and various choices of S . 2 22 Similarly, the corresponding log-log plots of the errors as a function of n are displayed in Figure 2. The simulations are limited to the range 5 n 1005. The R code that generated Figures 1 and 2 can be found at Supplementary Material. As a consequence of the previous theorem, we can derive asymptotic upper bounds on several probability metrics between the probability measures induced by the centered matrix-variate T distribution (1) and the corresponding centered matrix-variate normal distribution (2). The distance between Hotelling’s T statistic [24] and the corresponding matrix-variate normal distribution is obtained in the special case m = 1. d m Theorem 2 (Probability metric upper bounds). Let d, m 2 N, S 2 S and W 2 S be ++ ++ given. Assume that X T (n, S, W), Y MN (0 , S W), and let P and Q be d,m d,m dm n,S,W S,W the laws of X and Y n/(n 2), respectively. Then, as n ! ¥, 3/2 3/2 C(md) 2C(md) dist(P ,Q ) and H(P ,Q ) , n,S,W S,W n,S,W S,W n n where C > 0 is a universal constant, H(,) denotes the Hellinger distance, and dist(,) can be replaced by any of the following probability metrics: total variation, Kolmogorov (or uniform) metric, Lévy metric, discrepancy metric, Prokhorov metric. AppliedMath 2022, 2 450 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 0 0 0 0 −1 −1 −1 −1 ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) ● log(E ) log(ν ) 1 1 1 1 −1 −1 −1 −1 log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) log(E ) log(ν ) 2 2 2 2 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −1 −1 −1 log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) log(E0) log(ν ) −1 −1 −1 −1 ● ● ● ● log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) log(E1) log(ν ) −1 −1 −1 −1 log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) log(E2) log(ν ) 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 1. Plots of log E / log(n ) as a function of n, for various choices of S. The plots conﬁrm (5) for our choices of S and bring strong evidence for the validity of Theorem 1. 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E 1 E 1 E 1 E 0 0 0 0 ● 1 E ● 1 E ● 1 E ● 1 E 1 1 1 1 1 E 1 E 1 E 1 E 2 2 2 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 3 1 1 0 3 1 1 0 3 1 1 0 3 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 1 E0 1 E0 1 E0 1 E0 ● ● ● ● 1 E1 1 E1 1 E1 1 E1 1 E2 1 E2 1 E2 1 E2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 5 10 50 200 1000 ! ! ! ! ! ! ! ! 4 1 1 0 4 1 1 0 4 1 1 0 4 1 1 0 S = , W = S = , W = S = , W = S = , W = 1 2 0 1 1 3 0 1 1 4 0 1 1 5 0 1 Figure 2. Plots of 1/E as a function of n, for various choices of S. Both the horizontal and vertical axes are on a logarithmic scale. The plots clearly illustrate how the addition of correction terms from Theorem 1 to the base approximation (4) improves it. 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 1e+00 1e+03 1e+06 1e+09 −0.5 0.5 1.5 2.5 −0.5 0.5 1.5 2.5 1e+00 1e+03 1e+06 1e+09 AppliedMath 2022, 2 451 3. Proofs Proof of Theorem 1. First, we take the expression in (1) over the one in (2): md/2 [n/(n 2)] K (X) n,S,W g (X/ n/(n 2)) S,W md/2 d 1 2 G( (n + m + d j)) (6) n 2 G( (n + d j)) j=1 (n+m+d1)/2 (n 2) > 1 > exp tr D D I + n D D . X d X X X 2n The last determinant was obtained using the fact that the eigenvalues of a product of rectangular matrices are invariant under cyclic permutations (as long as the products remain well deﬁned). Indeed, for all j 2 f1, 2, . . . , dg, we have 1 1 1 > 1 1 1 > l (I + n S XW X ) = 1 + n l (S XW X ) j j 1 > 1 > = 1 + n l (D D ) = l (I + n D D ). j X j d X X X By taking the logarithm on both sides of (6), we get md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d h i md n 2 1 1 = log + log G( (n + m + d j)) log G( (n + d j)) å (7) 2 2 2 2 j=1 0 1 d d 1 (n + m + d 1) @ A + d log 1 + . å l å 2 2 n 2 j=1 j=1 By applying the Taylor expansions, 1 1 log G( (n + m + d j)) log G( (n + d j)) 2 2 1 1 1 1 = (n + m + d j 1) log (n + m + d j) (n + d j 1) log (n + d j) 2 2 2 2 m 2 2 2 12(n + m + d j) 12(n + d j) 3 3 2 2 + +O (n ) m,d 3 3 360(n + m + d j) 360(n + d j) 2 2 m n m(2 + 2d 2j + m) m 2 + 3d + 3j 3j(2 + m) = log + 2 2 4n 12n 3m + m + d(6 6j + 3m) 3 3 2 2 2 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m + +O (n ). m,d 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 24n (see, e.g., (Ref. [25], p. 257)) and md n 2 md n 4md 12md 32md log + log = + + +O (n ), m,d 2 3 2 2 2 2 4n 12n 24n and 1 1 1 2 3 4 5 log(1 + y) = y y + y y +O (y ), jyj < h < 1, 2 3 4 AppliedMath 2022, 2 452 in the above equation, we obtain md/2 [n/(n 2)] K (X) n,S,W log p g (X/ n/(n 2)) S,W d d 2 2 m(2 + 2d 2j + m) m 10 + 3d + 3j 3j(2 + m) å å 3m + m + d(6 6j + 3m) 4n 12n j=1 j=1 3 3 2 2 2 m 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m 2 2 2 24n 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) j=1 d d 1 (n + m + d 1) l + d p å å 2 2 n 2 j=1 j=1 ! ! 4 6 d d d d (n + m + d 1) l (n + m + d 1) l j j p p å å 4 6 n 2 n 2 j=1 j=1 ! ! d 1 + max jd j l 1jd l (n + m + d 1) j j + p +O . (8) å d,m,h 8 n n 2 j=1 Now, 1 n + m + d 1 (m + d + 1) (m + d + 1) 2(m + d + 1) = +O (n ), m,d 2 3 2 2(n 2) 2n n n n + m + d 1 1 (m + d + 3) (m + d + 2) = + + +O (n ), m,d 2 2 3 4(n 2) 4n 4n n n + m + d 1 1 (m + d + 5) = +O (n ), m,d 3 2 3 6(n 2) 6n 6n n + m + d 1 1 = +O (n ), m,d 4 3 8(n 2) 8n so we can rewrite (8) as md/2 [n/(n 2)] K (X) n,S,W log g (X/ n/(n 2)) S,W 1 (m + d + 1) m(2 + 2d 2j + m) 1 4 2 = n d d + l l j j 4 2 4 j=1 8 9 (m+d+3) 6 4 2 > > d + d (m + d + 1)d d < = 6 l 4 l l j j j 2 2 + n 10 + 3d + 3j 3j(2 + m) > > : ; j=1 12 2 3m + m + d(6 6j + 3m) 8 9 (m+d+5) 1 8 6 4 2 > > d d + (m + d + 2)d 2(m + d + 1)d d < = 8 6 l l l l j j j j 3 3 2 2 2 + n 32 + 4d 4j 6d (2 + 2j m) + 6j (2 + m) + (2 + m) m > > : + ; j=1 24 2 2 2 4j(2 3m + m ) + 4d(2 + 3j 3j(2 + m) 3m + m ) 1 + max jd j 1jd l +O , d,m,h which proves (3) after some simpliﬁcations with Mathematica. AppliedMath 2022, 2 453 Proof of Theorem 2. By the comparison of the total variation normkk with the Hellinger distance on page 726 of Carter [26], we already know that kP Q k n,S,W S,W " ! # u (9) dP n,S,W t c 2P X 2 B (1/2) + E log (X) 1 . fX2B (1/2)g n,S,W n,S,W dQ S,W 1/2 1/2 Given that D = S XW T (n, I , I ) by Theorem 4.3.5 in [1], we know, by X m d,m d Theorem 4.2.1 in [1], that law 1 1/2 D = (n S) Z, for S Wishart (n + d 1, I ) and Z MN (0 , I I ) that are independent, so dd d dm dm d m that, by Theorems 3.3.1 and 3.3.3 in [1], we have > 1 D D jS Wishart (m, nS ). (10) X X dd Therefore, by conditioning on S, and then by applying the sub-multiplicativity of the largest eigenvalue for nonnegative deﬁnite matrices, and a large deviation bound on the maximum eigenvalue of a Wishart matrix (which is sub-exponential), we get, for n large enough, " !# 1/2 c > P X 2 B (1/2) E P l (D D ) > S 1 X n,S,W X " !# 1/2 1 1/2 > 1 1/2 E P l ((n S) )l (ZZ )l ((n S) ) > S 1 1 1 l (S) > d = E P l (ZZ ) > S 1/2 4 n 1/2 C exp , (11) m,d 10 md for some positive constant C that depends only on m and d. By Theorem 1, we also have m,d " ! # dP n,S,W E log (X) 1 fX2B (1/2)g n,S,W dQ S,W ( ) 1 > 2 E tr (D D ) 1 4 X = n (m+d+1) md(m+d+1) E tr(D D ) + 2 X 4 8 h i 9 (12) > 2 < = O E tr (D D ) 1 X fX2B (1/2)g n,S,W h i + n : ; +(m + d)O E tr(D D )1 +O(m(m + d)) X fX2B (1/2)g n,S,W ( ) > 3 > 2 O E tr (D D ) + (m + d)O E tr (D D ) X X X X + n . > 2 +(m + d)O E tr(D D ) +O(md(m + d) ) On the right-hand side, the ﬁrst line is estimated using Lemma A1, and the second line is bounded using Lemma A2. We ﬁnd " ! # dP n,S,W 3 3 2 E log (X) 1 = O(m d n ). fX2B (1/2)g n,S,W dQ S,W Putting (11) and (12) together in (9) gives the conclusion. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/appliedmath2030025/s1. AppliedMath 2022, 2 454 Funding: F.O. is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The R code for the simulations in Section 2 is in Supplementary Mate- rial. Acknowledgments: We thank the three referees for their comments. Conﬂicts of Interest: The author declares no conﬂicts of interest. Appendix A. Technical Computations Below, we compute the expectations for some traces of powers of the matrix-variate Student distribution. The lemma is used to estimate some trace moments and the n errors in (12) of the proof of Theorem 2, and also as a preliminary result for the proof of Lemma A2. d m Lemma A1. Let d, m 2 N, S 2 S and W 2 S be given. If X T (n, S, W) according ++ ++ d,m to (1), then h i md n E tr D D = , (A1) n 2 h i md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) = , (A2) (n 1)(n 2)(n 4) 1/2 1/2 where we recall D := S XW . In particular, as n ! ¥, we have h i h i > > 2 E tr D D md and E tr (D D ) md(m + d + 1). X X X X Proof of Lemma A1. For W Wishart (n,V) with n > 0 and V 2 S , we know from dd ++ (Ref. [1], p. 99) (alternatively, see (Ref. [27], p. 66) or (Ref. [28], p. 308)) that E[W] = nV and E[W ] = nf(n + 1)V + tr(V) I gV, and from (Ref. [1], pp. 99–100) (alternatively, see [29] and ([28], p. 308), or ([30], pp. 101–103)) that E[W ] = , for n d 1 > 0, n d 1 1 1 2 tr(V )V + (n d 1)V E[W ] = , for n d 3 > 0, (n d)(n d 1)(n d 3) and from (Corollary 3.1 in [30]) that 1 1 2 (n d 2) tr(V )V + 2V 1 1 E[tr(W )W ] = , for n d 3 > 0. (n d)(n d 1)(n d 3) Therefore, by combining the above moment estimates with (10), we have h i h i m n > > 1 1 E D D = E E[D D jS] = E[m (nS )] = m nE[S ] = I , X X d X X n 2 h i h i h n o i > 2 > 2 1 1 1 E (D D ) = E E[(D D ) jS] = E m (m + 1) (nS ) + tr(nS ) I (nS ) X X d X X n o 2 2 1 1 = m n (m + 1)E[S ] + E[tr(S )S ] AppliedMath 2022, 2 455 m n f(m + 1) (n + d 2) + (n 3) d + 2g = I , (n 1)(n 2)(n 4) By linearity, the trace of an expectation is the expectation of the trace, so (A1) and (A2) follow from the above equations. We can also estimate the moments of Lemma A1 on various events. The lemma below is used to estimate the n errors in (12) of the proof of Theorem 2. d m dm Lemma A2. Let d, m 2 N, S 2 S and W 2 S be given, and let A 2 B(R ) be a Borel ++ ++ set. If X T (n, S, W) according to (1), then, for n large enough, d,m h i 1/2 > 3/2 c E tr(D D )1 2 md (P(X 2 A )) , (A3) X fX2Ag h i 2 md n f(m + d)(n 2) + n + mdg > 2 E tr (D D ) 1 X fX2Ag (n 1)(n 2)(n 4) 2 5/2 c 1/2 100 m d (P(X 2 A )) , (A4) 1/2 1/2 where we recall D := S XW . Proof of Lemma A2. By Lemma A1, the Cauchy–Schwarz inequality and Jensen’s inequality, > 2 > 2 (tr(D D )) d tr((D D ) ), X X X X we have h i h i > > E tr(D D )1 = E tr(D D )1 c X X X fX2Ag X fX2A g h i 1/2 1/2 > 2 c E (tr(D D )) (P(X 2 A )) h i 1/2 > 2 c 1/2 d E tr((D D ) ) P X 2 A ( ( )) 1/2 3/2 c 2 md (P(X 2 A )) , which proves (A3). Similarly, by Lemma A1, Holder ’s inequality and Jensen’s inequality, > 2 2 > 4 (tr((D D ) )) d tr((D D ) ), X X X X we have, for n large enough, h i md n f(m + d)(n 2) + n + mdg > 2 E tr((D D ) )1 X fX2Ag (n 1)(n 2)(n 4) h i h i 1/2 1/2 > 2 > 2 2 c = E tr((D D ) )1 c E (tr((D D ) )) (P(X 2 A )) X X X fX2A g X h i 1/2 > 4 c 1/2 dE tr((D D ) ) (P(X 2 A )) 1/2 1/2 1/2 4 4 c 2 5/2 c d 10 (md) (P(X 2 A )) 100 m d (P(X 2 A )) , which proves (A4). This ends the proof. AppliedMath 2022, 2 456 References 1. Gupta, A.K.; Nagar, D.K. Matrix Variate Distributions, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1999; p. 384. 2. Olver, F.W.J.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. (Eds.) NIST Handbook of Mathematical Functions; U.S. Department of Commerce, National Institute of Standards and Technology: Washington, DC, USA; Cambridge University Press: Cambridge, UK, 2010; p. xvi+951. 3. Nagar, D.K.; Roldán-Correa, A.; Gupta, A.K. Extended matrix variate gamma and beta functions. J. Multivar. Anal. 2013, 122, 53–69. [CrossRef] 4. Pajevic, S.; Basser, P.J. Parametric description of noise in diffusion tensor MRI. In Proceedings of the 7th Annual Meeting of the ISMRM, Philadelphia, PA, USA, 22–28 May 1999; p. 1787. 5. Basser, P.J.; Jones, D.K. Diffusion-tensor MRI: Theory, experimental design and data analysis—A technical review. NMR Biomed. 2002, 15, 456–467. [CrossRef] [PubMed] 6. Pajevic, S.; Basser, P.J. Parametric and non-parametric statistical analysis of DT-MRI data. J. Magn. Reson. 2003, 161, 1–14. [CrossRef] 7. Basser, P.J.; Pajevic, S. A normal distribution for tensor-valued random variables: Applications to diffusion tensor MRI. IEEE Trans. Med. Imaging 2003, 22, 785–794. [CrossRef] [PubMed] 8. Gasbarra, D.; Pajevic, S.; Basser, P.J. Eigenvalues of random matrices with isotropic Gaussian noise and the design of diffusion tensor imaging experiments. SIAM J. Imaging Sci. 2017, 10, 1511–1548. [CrossRef] [PubMed] 9. Alexander, D.C.; Pierpaoli, C.; Basser, P.J.; Gee, J.C. Spatial transformations of diffusion tensor magnetic resonance images. IEEE Trans. Med. Imaging 2001, 20, 1131–1139. [CrossRef] [PubMed] 10. Schwartzman, A.; Mascarenhas, W.F.; Taylor, J.E. Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices. Ann. Statist. 2008, 36, 2886–2919. [CrossRef] 11. Mallows, C.L. Latent vectors of random symmetric matrices. Biometrika 1961, 48, 133–149. [CrossRef] 12. Hu, W.; White, M. A CMB polarization primer. New Astron. 1997, 2, 323–344. [CrossRef] 13. Vafaei Sadr, A.; Movahed, S.M.S. Clustering of local extrema in Planck CMB maps. MNRAS 2021, 503, 815–829. [CrossRef] 14. Gallaugher, M.P.B.; McNicholas, P.D. Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 2018, 80, 83–93. [CrossRef] 15. Ouimet, F. Reﬁned normal approximations for the Student distribution. J. Classical Anal. 2022, 20, 23–33. [CrossRef] 16. Shaﬁei, A.; Saberali, S.M. A simple asymptotic bound on the error of the ordinary normal approximation to the Student’s t-distribution. IEEE Commun. Lett. 2015, 19, 1295–1298. [CrossRef] 17. Govindarajulu, Z. Normal approximations to the classical discrete distributions. Sankhya¯ Ser. A 1965, 27, 143–172. 18. Esseen, C.G. Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math. 1945, 77, 1–125. [CrossRef] 19. Cressie, N. A ﬁnely tuned continuity correction. Ann. Inst. Statist. Math. 1978, 30, 435–442. [CrossRef] 20. Gaunt, R.E. Variance-gamma approximation via Stein’s method. Electron. J. Probab. 2014, 19, 1–33. [CrossRef] 21. Gaunt, R.E. New error bounds for Laplace approximation via Stein’s method. ESAIM Probab. Stat. 2021, 25, 325–345. [CrossRef] 22. Gaunt, R.E. Wasserstein and Kolmogorov error bounds for variance-gamma approximation via Stein’s method I. J. Theoret. Probab. 2020, 33, 465–505. [CrossRef] 23. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 24. Hotelling, H. The generalization of Student’s ratio. Ann. Math. Statist. 1931, 2, 360–378. [CrossRef] 25. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; National Bureau of Standards Applied Mathematics Series, For sale by the Superintendent of Documents; U.S. Government Printing Ofﬁce: Washington, DC, USA, 1964; Volume 55, p. xiv+1046. 26. Carter, A.V. Deﬁciency distance between multinomial and multivariate normal experiments. Ann. Statist. 2002, 30, 708–730. [CrossRef] 27. de Waal, D.J.; Nel, D.G. On some expectations with respect to Wishart matrices. South African Statist. J. 1973, 7, 61–67. 28. Letac, G.; Massam, H. All invariant moments of the Wishart distribution. Scand. J. Statist. 2004, 31, 295–318. [CrossRef] 29. Haff, L.R. An identity for the Wishart distribution with applications. J. Multivar. Anal. 1979, 9, 531–544. [CrossRef] 30. von Rosen, D. Moments for the inverted Wishart distribution. Scand. J. Statist. 1988, 15, 97–109. [CrossRef]
AppliedMath – Multidisciplinary Digital Publishing Institute
Published: Aug 1, 2022
Keywords: asymptotic statistics; expansion; Hotelling’s T-squared statistic; Hotelling’s T statistic; matrix-variate normal distribution; local approximation; matrix-variate T distribution; normal approximation; Student distribution; T distribution; total variation
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.