Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The generalized Simpson’s entropy is a measure of biodiversity

The generalized Simpson’s entropy is a measure of biodiversity a1111111111 a1111111111 Modern measures of diversity satisfy reasonable axioms, are parameterized to produce a1111111111 diversity profiles, can be expressed as an effective number of species to simplify their inter- a1111111111 pretation, and come with estimators that allow one to apply them to real-world data. We introduce the generalized Simpson's entropy as a measure of diversity and investigate its properties. We show that it has many useful features and can be used as a measure of biodi- versity. Moreover, unlike most commonly used diversity indices, it has unbiased estimators, OPENACCESS which allow for sound estimation of the diversity of poorly sampled, rich communities. Citation: Grabchak M, Marcon E, Lang G, Zhang Z (2017) The generalized Simpson's entropy is a measure of biodiversity. PLoS ONE 12(3): e0173305. doi:10.1371/journal.pone.0173305 Editor: Stefan J. Green, University of Illinois at Chicago, UNITED STATES Introduction Received: November 3, 2016 Many indices of biodiversity have been proposed based on different definitions of diversity and different visions of the biological aspects to address [1]. Indeed, measuring diversity Accepted: February 17, 2017 requires both a robust theoretical framework [2] and empirical techniques to effectively esti- Published: March 7, 2017 mate it [3]. We focus on species-neutral diversity, i.e. the diversity of the distribution of spe- Copyright: This is an open access article, free of all cies, ignoring their features. Such measures only make sense when applied to a single copyright, and may be freely reproduced, taxocene, i.e. a subset of species in the community under study that belong to the same taxon distributed, transmitted, modified, built upon, or (e.g. butterflies) or, more loosely, to a meaningful group (e.g. trees). Classical measures of this otherwise used by anyone for any lawful purpose. type include richness (the number of species), Shannon's entropy [4], and Simpson's index [5]. The work is made available under the Creative Since one index is generally insufficient to fully capture the diversity of a community, mod- Commons CC0 public domain dedication. ern measures of diversity are parameterizable, allowing the user to give more or less relative Data Availability Statement: Data are available importance to rare versus frequent species [6]. Further, they can be expressed as an effective from the entropart package for R, available on number of species [7], which allows for an easy interpretation of their values [8]. Among the CRAN: https://cran.r-project.org/web/packages/ entropart/index.html. most popular indices of this type are HCDT entropy [9±11] (which includes richness, Simp- son's index, and Shannon's entropy as special cases), Re Ânyi's entropy [6], and the less-used Funding: This work has benefited from an Hurlbert's index [12]. These indices can be used to estimate the diversity of a community and ªInvestissement d'Avenirº grant managed by Agence Nationale de la Recherche (CEBA, ref. ANR- then to plot their values against the parameter, which controls the weight of rare species, to 10-LABX-25-01). obtain a diversity profile [7]. The profiles of two communities can be compared to provide a partial order of their diversity. If the profiles do not cross, one community can be declared to Competing interests: The authors have declared that no competing interests exist. be more diverse than the other [13]. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 1 / 11 The generalized Simpson's entropy is a measure of biodiversity HCDT entropy has many desirable properties [8, 14] but, despite recent progress [15], it cannot be accurately estimated when the communities are insufficiently sampled [16]. Re Ânyi's entropy is related to HCDT entropy by a straightforward transformation: the natural loga- rithm of the deformed exponential [14]. Its properties are very similar and, hence, it will not be treated here. Hurlbert's index has a simple and practical interpretation and can be estimated with no bias, but only up to when its parameter is strictly less than the sample size. We introduce generalized Simpson's entropy as a measure of diversity for its particular per- formance when it is used to estimate the diversity of small samples from hyper-diverse com- munities. The generalized Simpson's entropy z is parameterized: increasing its parameter r gives more relative importance to rare species. It has a simple interpretation, specifically, in a species accumulation curve, z is the probability that the individual sampled at rank r + 1 belongs to a new species. We show that z is a valid measure of diversity, satisfying the axioms established in the literature [2, 6]. We then show how to estimate z with no bias and how to construct confidence intervals, which can be used to compare the diversities of different com- munities. After this, we derive a simple formula for the corresponding effective number of spe- cies and discuss its estimation. Finally, we compare it to HCDT entropy and Hurlbert's index on a real-world example of under-sampled tropical forest to illustrate its decisive advantage when applied to this type of data. 1 Methods 1.1 Generalized Simpson's entropy Let ℓ , ℓ , . . ., ℓ be the species in a community, and let p be the proportion of individuals 1 2 S s belonging to species ℓ . Necessarily, 0 p  1 and p ˆ 1. We can interpret p as the s s s s sˆ1 probability of seeing an individual of species ℓ when sampling one individual from this com- munity. Generalized Simpson's entropy is a family of diversity indices defined by z ˆ p…1 p† ; r ˆ 1; 2; . . . : …1† r s s sˆ1 The parameter r is called the order of z . Note that, as r increases, z gives more relative weight r r to rare species than to more common ones. Note further that 0 z  1. In fact, z is the proba- r r bility that the (r + 1)st observation will be of a species that has not been observed before. Generalized Simpson's entropy was introduced as part of a larger class in [17] and was fur- ther studied in [18]. The name comes from the fact that 1 − z corresponds to Simpson's index as defined in [5]. A major advantage to working with this family is that there exists an unbiased estimator of z whenever r is strictly less than the sample size. While a similar result holds for Hurlbert's index, this is not the case with most popular diversity indices including HCDT entropy and Re Ânyi's entropy, which do not have unbiased estimators. We now turn to the question of when and why generalized Simpson's entropy is a good measure of diversity. 1.2 Axioms for a measure of diversity Historically, measures of diversity have been defined as functions mapping the proportions p , p , . . ., p into the real line, and satisfying certain axioms. We write H(p , p , . . ., p ) to 1 2 S 1 2 S denote a generic function of this type. We begin with three of the most commonly assumed axioms. The first two are from Re Ânyi [6] after Faddeev [19]. Axiom 1 (Symmetry) H(p , p , . . ., p ) must be a symmetric function of its variables. 1 2 S This means that no species can have a particular role in the measure. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 2 / 11 The generalized Simpson's entropy is a measure of biodiversity Axiom 2 (Continuity) H(p , p , . . ., p ) must be a continuous function of the vector 1 2 S (p , p , . . ., p ). 1 2 S This ensures that a small change in probabilities yields a small change in the measure. In particular, two communities differing by a species with a probability very close to 0 have almost the same diversity. Axiom 3 (Evenness) For a fixed number of species S, the maximum diversity is achieved when all species probabilities are equal, i.e., H…p ; p ; . . . ; p †  H…1=S; 1=S; . . . ; 1=S†: …2† 1 2 S This axiom was called evenness by Gregorius [20]. It means that the most diverse commu- nity of S species is the one where all species have the same proportions. We will give a more restrictive version of this axiom. Toward this end, following Patil and Taillie [2], we define a transfer of probability. This is an operation that consists of taking two species with p < p and modifying these probabilities to increase p by h > 0 and decrease p s t s t by h, such that we still have p + h p − h. In other words, some individuals of a more com- s t mon species are replaced by ones of a less common species, but in such a way that the order of the two species does not change. Axiom 4 (Principle of transfers) Any transfer of probability must increase diversity. The principle of transfers comes from the literature of inequality [21]. It is clear that this axiom is stronger than the axiom of evenness: if any transfer increases diversity, then, necessar- ily, the maximum value is reached when no more transfer is possible, i.e. when all proportions are equal. Generalized Simpson's entropy belongs to an important class of diversity indices, which are called trace-form entropies in statistical physics and dichotomous diversity indices in [2]. This class consists of indices of the form H…p ; p ; . . . ; p † ˆ p I…p†, where I(p) is called the 1 2 S sˆ1 s s information function. Indices of this type were studied extensively in [2] and [20]. I(p) defines the amount of information [4], or uncertainty [6], or surprise [22]. All of these terms can be taken as synonyms; they get at the idea that I(p) measures the rarity of individuals from a spe- cies with proportion p [2]. This discussion leads to the following axiom. Axiom 5 (Decreasing information) I(p) must be a decreasing function of p on the interval (0, 1] and I(1) = 0. This can be interpreted to mean that observing an individual from an abundant species brings less information than observing one from a rare species, and if an individual is observed from a species that has probability 1, then this observation brings no information at all. Patil and Taillie [2] showed that Axiom 5 ensures that adding a new species increases diver- sity. They also showed that both the principle of transfers and the axiom of decreasing infor- mation are satisfied if the function g(p) = pI(p) is concave on the interval [0, 1]. However, for generalized Simpson's entropy, g…p† ˆ p…1 p† ; p 2 ‰0; 1Š …3† is not a concave function of p if r > 1. In fact, for r > 1 generalized Simpson's entropy does not satisfy the principle of transfers. For this reason Gregorius [20], in a study of many different entropies, did not retain it. However, we will show that generalized Simpson's entropies satisfy a weaker version of the principle of transfers, and are, nevertheless, useful measures of diversity. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 3 / 11 The generalized Simpson's entropy is a measure of biodiversity 1.3 The generalized Simpson's entropy is a measure of diversity It is easy to see that generalized Simpson's entropy always satisfies Axioms 1, 2 and 5, but, as we have discussed, it does not satisfy Axiom 4. However, we will show that it satisfies a weak version of it and that it satisfies Axiom 3 for a limited, but wide range of orders r. Axiom 6 (Weak principle of transfers) Any transfer of probability must increase diversity as long as the sum of the probabilities of the concerned species is below a certain threshold, i.e., the principle of transfers holds so long as p ‡ p  T for some 0 < T  1: …4† s t We now give our results about the properties of generalized Simpson's entropy. The proofs are in S1 Appendix. Proposition 1 Generalized Simpson’s entropy of order r respects the weak principle of transfers with T ˆ . r‡1 Proposition 2 Generalized Simpson’s entropy of order r respects the evenness axiom if r S − 1. In light of Proposition 2, we will limit the order to r = 1, 2, . . ., (S − 1). In this case, general- ized Simpson's entropy satisfies Axioms 1±3, and can be regarded as a measure of diversity. 2 2 Moreover, it satisfies Axiom 5 and the weak principle of transfers up to T ˆ  . Thus, a r‡1 S transfer of probability increases diversity, except between very abundant species. 1.4 Estimation In practice, the proportions, (p , p , . . ., p ), are unknown and, hence, the value of generalized 1 2 S Simpson's entropy as well as any other diversity index is unknown and can only be estimated from data. For this purpose, assume that we have a random sample of n individuals from a given community. The assumption that we have a random sample, i.e. that the observations are independent and identically distributed, may be unrealistic in some situations. However, most estimators rely on this assumption, and appropriate sampling design is the simplest solu- tion to obtain independent and identically distributed data. See [23] for a review of these issues in the context of forestry. In principle, the assumption of a random sample implies that either the population is infinite, or that the sampling is done with replacement. In practice, the popu- lation is finite and sampling in ecological studies is usually performed without replacement. However, when the sample size is much smaller than the population, the dependence intro- duced by sampling from a finite population without replacement is negligible and can be ignored. Let n be the number of individuals sampled from species ℓ , and note that n ˆ n . We s s sˆ1 s can estimate p by p ˆ n =n. A naive estimator of z is given by the so-called ªplug-inº estima- s r s s S r ^ ^ tor p …1 p † . Unfortunately, this may have quite a bit of bias. However, for 1 r sˆ1 s s (n − 1), an unbiased estimator of z exists and is given by S r 1 r‡1 X Y n ‰n r 1Š! j Z ˆ p ^ 1 p ^ ; …5† r s s n! n sˆ1 jˆ0 see [17]. There it is shown that Z is a uniformly minimum variance unbiased estimator (umvue) for z when 1 r (n − 1). Note that the sum in Eq (5) ranges over all of the species in the community. This may appear impractical since we generally do not know the value of S. However, for any species ℓ that is not observed in our sample, we have p ^ ˆ 0, and we do not need to include it in the sum. Assume that we have observed K S different species in the sample and that these species PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 4 / 11 The generalized Simpson's entropy is a measure of biodiversity 0 0 0 0 are ` ;` ; . . . ; ` . For each s = 1, 2, . . ., K, let n be the number of individuals from species ` 1 2 K s s 0 0 sampled, and let p ^ ˆ n =n be the estimated proportion of species ` . In this case we can write s s s K r 1 r‡1 X Y n ‰n r 1Š! j 0 0 ^ ^ Z ˆ p 1 p : …6† s s n! n sˆ1 jˆ0 With a few simple algebraic steps, we can rewrite this in the form K r X Y n 1 0 s Z ˆ p 1 ; …7† r s n j sˆ1 jˆ1 which we have found to be more tractable for computational purposes. In [17] and [18] it is shown that Z is consistent and asymptotically normal. These facts can be used to construct asymptotic confidence intervals. First, define the (K − 1) × (K − 1) dimen- sional matrix given by 0 1 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ p …1 p † p p  p p 1 1 1 2 1 K 1 B C B C 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ B p p p …1 p †  p p C 2 1 2 2 2 K 1 B C S ˆ …8† B C B C B C @ A 0 0 0 0 0 0 ^ ^ ^ ^ ^ p p p p  p …1 p † K 1 1 K 1 2 K 1 K 1 and the (K − 1) dimensional column vector h , where for each j = 1, . . ., (K − 1) the jth compo- nent of h is given by r r 1 r r 1 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ …9† 1 p ‡ rp 1 p 1 p rp 1 p : j j j K K K When there exists at least one s with p 6ˆ 1/S (i.e. we do not have a uniform distribution) then an asymptotic (1 −α)100% confidence interval for z is given by s ^ Z  z p ; …10† r a=2 where q ^T ^ ^ …11† s ^ ˆ h Sh r r ^T ^ is the estimated standard deviation, h is the transpose of h , and z is a number satisfying α/2 r r P(Z > z ) =α/2 where Z * N(0, 1) is a standard normal random variable. Methods for evalu- α/2 ating Z and s ^ are available in the package EntropyEstimation [24] for R [25]. For details about the confidence interval see S1 Appendix. 1.5 Comparing distributions In many situations it is important not only to estimate the diversity of one community, but to compare the diversities of two different communities. Toward this end, we discuss the con- struction of confidence intervals for the difference between the generalized Simpson's entro- pies of two communities. …1† …2† Fix an order r and let z and z be the generalized Simpson's entropies of the first and sec- r r ond community respectively. To estimate these, assume that we have a random sample of size PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 5 / 11 The generalized Simpson's entropy is a measure of biodiversity n from the first community and a random sample of size n from the second community. 1 2 Assume further that these two samples are independent of each other and that r (min{n , n } − 1), where min{n , n } is the minimum of n and n . If both communities satisfy the con- 2 1 2 1 2 ditions given in Section 1.4, an asymptotic (1 −α)100% confidence interval for the difference …1† …2† z z is given by r r s 2 2 …1† …2† s ^ s ^ r r …1† …2† …12† Z Z  z ‡ ; a=2 r r n n 1 2 …1† …2† …1† …2† …1† …2† where Z and Z are the estimates of z and z and s ^ and s ^ are the estimated standard r r r r r r deviations as in Eq (11). In practice, it is often not enough to look at only one diversity index. For this reason we may want to look at an entire profile of generalized Simpson's entropies. This can be done as follows. Fix any positive integer v (min{n , n } − 1). In order for z to be a reasonable diver- 1 2 v …1† …2† sity estimator, we also require v (S − 1). For each r = 1, 2, . . ., v we can estimate Z , Z , r r and the corresponding confidence interval. Looking at these for all values of r gives a pointwise confidence envelope. We can now see if the two communities have statistically significant dif- ferences in the amount of diversity by seeing if zero is in the envelope or not. If it is generally in the envelope then the differences are not significant, and if it is generally outside of the enve- lope then the differences are significant. 1.6 Effective number of species The effective number of species [7] is the number of equiprobable species that would yield the same diversity as a given distribution [26]. It is a measure of diversity sensu sticto [8]. We will r z write entropy for z and diversity for its effective number, which we denote by D . To derive r z D we assume r z D r 1 1 z ˆ 1 ; …13† r z r z D D sˆ1 and then simple algebra yields r z D ˆ : …14† 1 z r z Note that Eq (13) assumes that D is an integer, while in Eq (14) it is generally not an integer. This is not an issue because Eq (13) is just a formalism used to derive Eq (14). A more devel- oped argumentation can be found in Appendix B of [20]. 1/r Since the function f(t) = 1/(1 − t ), t2 [0, 1] is monotonically increasing, we can transform r z confidence intervals for z into confidence intervals for D as follows. If (L, U) is a (1 −α) r z 100% confidence interval for z then (f(L), f(U)) is a (1 −α)100% confidence interval for D . It r z is important to note that any inference based on such confidence intervals for D is equivalent to inference based on the original confidence interval for z . 2 Example data and results In this section we apply our methodology to estimate and compare the diversities of two 1-ha plots (#6 and #18) of tropical forest in the experimental forest of Paracou, French Guiana [27]. Respectively 641 and 483 trees with diameter at breast height over 10 cm were inventoried. The data is available in the entropart package for R. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 6 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 1. Generalized Simpson's entropy and diversity profiles. (a) entropy and (b) diversity profiles of Paracou plots 6 (solid, green lines) and 18 (dotted, red lines). The bold lines represent the estimated values, surrounded by their 95% confidence envelopes. doi:10.1371/journal.pone.0173305.g001 In the data, we observe 147 and 149 species from plots 6 and 18 respectively. However, spe- cies may not have been sampled and we must adjust these values. Jackknives tend to be good estimators of richness, see [28]. We use a jackknife of order 2 for plot 6 and one of order 3 for plot 18: the choice of the optimal order follows both [28] and [29]. The estimated richness is, respectively, 254 and 309 species. For this reason we estimate generalized Simpson's entropy up to order r = 253. This, along with a 95% confidence envelope is given in Fig 1a. The generalized Simpson's diversity profiles along with a 95% confidence envelope are given in Fig 1b. These give more intuitive information since they represent the effective num- bers of species. Their values at r = 1 are given, respectively, by 39 and 46 species. Increasing val- ues of r give more importance to rare species, which leads to the increase in the effective number of species seen in the graph. Plot 18 is clearly more diverse than plot 6, with a fairly stable difference of between 15 and 19 effective species. In Fig 2 the difference between the entropies is plotted with its 95% confi- dence envelope to test it against the null hypothesis of zero difference. Since zero is never in this envelope, we conclude that plot 18 is significantly more diverse than plot 6. 3 Discussion 3.1 Interpretation Generalized Simpson's entropy of order r can be interpreted as the average information brought by the observation of an individual. Its information function I(p) = (1 − p) represents the probability of not observing a single individual of a species with proportion p in a sample of size r. Thus I is an intuitive measure of rarity. Olszewski [30] (see also [31]) interpreted z as the probability that the individual sampled at rank (r + 1) belongs to a previously unobserved species in a species accumulation curve, i.e. the slope of the curve at rank (r + 1). A related interpretation is as follows. If X is the number of species observed exactly once in a sample of size (r + 1), then z = E[X]/(r + 1). These interpretations are not limited to orders r < S. However, when r S, z is no longer a reasonable measure of diversity. In particular, in this case, it may not be maximized at the uni- r z form distribution, which could lead the effective number of species, D , to be greater than the actual number of species. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 7 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 2. Difference between the generalized Simpson's entropy of plots 6 and 18 with their 95% confidence envelope. The horizontal dotted line represents the null hypothesis of identical diversity. Since it is always outside of the confidence envelope, identical diversity is rejected. doi:10.1371/journal.pone.0173305.g002 3.2 HCDT entropy In this section we compare our results to those based on the more standard HCDT entropy, which is given by p 1 q sˆ1 s …15† T ˆ ; q  0; 1 q where, for q = 1, this is interpreted by its limiting value as T ˆ p logp . The effective sˆ1 s s number of species for HCDT entropy was derived in [7]. It is given by 1=…1 q† q T q …16† D ˆ p ; q  0; sˆ1 q T T where, for q = 1, this is interpreted by its limiting value as D ˆ e . We call this quantity HCDT diversity, although in the literature it is often called Hill's diversity number. For our q T data, plots of D for q2 [0, 2] along with a 95% confidence envelope are given in Fig 3a. Here q T D was estimated using the jackknife-unveiled estimator of [16] and the confidence envelope was estimated using bootstrap. It is easy to see that the importance of rare species increases for HCDT entropy as q decreases. For comparison, the importance of rare species for generalized Simpson's entropy increases as r increases. Note that T = z . To see what values of q in HCDT entropy correspond r z q T to other values of r for generalized Simpson's entropy, we can find when D = D . Since we can only use z up to r = S − 1 it is of interest to find which value of q corresponds to this value. For our data we find that in plot 6 q = 0.5 corresponds to r = 253 and in plot 18 q = 0.55 corre- sponds to r = 308. The main difficulty in working with HCDT entropy is that its estimators have quite a lot of bias, especially for smaller values of q [16]. This is illustrated in Fig 3a, where we see that the PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 8 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 3. (a) HCDT and (b) Hurlbert's diversity profiles of Paracou plots 6 (solid, green lines) and 18 (dotted, red lines). The bold lines represent the estimated values, surrounded by their 95% confidence envelope (obtained by 1000 bootstraps). doi:10.1371/journal.pone.0173305.g003 confidence intervals of the estimated values of the HCDT diversity of plots 6 and 18 have sig- nificant overlap up to q = 0.75. Bias is not an issue with generalized Simpson's entropy, which can be estimated with no bias, regardless of the sample size (although its precision does depend on the sample size, see Eq (10)). The main issue with generalized Simpson's entropy is that it can only be considered for orders r S − 1, and larger values of r correspond to smaller values of q for HCDT entropy. In our example, the generalized Simpson's diversity profile can be compared to the part of the HCDT diversity profile between q = 0.5 and q = 2. Focusing more on rare species is not possi- ble. HCDT diversity allows that theoretically, but is seriously limited by its estimation issues: the profile has a wide confidence envelope and is not conclusive below q = 0.75. On the whole, generalized Simpson's entropy allows for a more comprehensive comparison of diversity profiles. If richness were greater, higher orders of generalized Simpson's diversity could be used and estimated with no bias, while low-order HCDT estimation would get more uncertain [16]. 3.3 Hurlbert's diversity Another measure of diversity, which is related to generalized Simpson's entropy, was intro- duced in [12]. It is given by h i H ˆ 1 …1 p† ; k ˆ 1; 2; . . . ; …17† sˆ1 and corresponds to the expected number of species found in a sample of size k. It is easily veri- fied that H = 1 + z , and that the higher the value of k, the greater the importance given to rare species. While there is no simple formula for the corresponding effective number of spe- cies, an iterative procedure for finding it was developed in [32]. Hurlbert [12] developed an unbiased estimator of H for all k smaller than the sample size. This is similar to what is needed to estimate generalized Simpson's entropy, although, general- ized Simpson's entropy also needs r < S for it to be a measure of diversity. We estimate Hurl- bert's index for the two plots, convert them into effective numbers of species, and use bootstrap to get a 95% confidence envelope. The results are given in Fig 3b. We see that the maximum effective numbers of species are well below those of the generalized Simpson's diversity. Thus Hurlbert's diversity finds fewer rare species, making it a less interesting alterna- tive for our purpose. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 9 / 11 The generalized Simpson's entropy is a measure of biodiversity 4 Conclusion Generalized Simpson's entropy is a measure of diversity respecting the classical axioms when r < S and has a simple formula to transform it into an effective number of species. It faces sev- eral issues that limit its use. Specifically, it only makes sense when applied to a single taxocene and its estimator has nice properties only under the assumption of random sampling. How- ever, these issues are shared with all of the other measures of diversity discussed here and many, if not most, of the ones available in the literature. Further, generalized Simpson's entropy has a decisive advantage over other such measures: it has an easy-to-calculate uni- formly minimum variance unbiased estimator, which is consistent and asymptotically normal. These properties make it a useful tool for estimating diversity and for comparing hyper- diverse, poorly sampled communities. R code to reproduce the examples in the paper, based on the packages EntropyEstimation and entropart [22], is given in S2 Appendix. All data are available in the entropart package. Supporting information S1 Appendix. Proofs. (PDF) S2 Appendix. R code. This code allows for the reproduction of all examples and figures in this article. (PDF) Author Contributions Conceptualization: ZZ MG EM. Data curation: EM MG. Formal analysis: MG EM GL ZZ. Investigation: MG EM GL ZZ. Methodology: MG EM GL ZZ. Software: MG EM. Supervision: ZZ. Validation: MG EM GL ZZ. Visualization: MG EM. Writing ± original draft: MG EM. References 1. Ricotta C. Through the jungle of biological diversity. Acta Biotheoretica. 2005; 53(1):29±38. doi: 10. 1007/s10441-005-7001-6 PMID: 15906141 2. Patil GP, Taillie C. Diversity as a concept and its measurement. Journal of the American Statistical Association. 1982; 77(379):548±561. doi: 10.2307/2287712 3. Beck J, Schwanghart W. Comparing measures of species diversity from incomplete inventories: an update. Methods in Ecology and Evolution. 2010; 1(1):38±44. doi: 10.1111/j.2041-210X.2009.00003.x 4. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948; 27:379±423, 623±656. doi: 10.1002/j.1538-7305.1948.tb01338.x 5. Simpson EH. Measurement of diversity. Nature. 1949; 163(4148):688. doi: 10.1038/163688a0 PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 10 / 11 The generalized Simpson's entropy is a measure of biodiversity 6. Re  nyi A. On Measures of Entropy and Information. In: Neyman J, editor. 4th Berkeley Symposium on Mathematical Statistics and Probability. vol. 1. Berkeley, USA: University of California Press; 1961. p. 547±561. 7. Hill MO. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology. 1973; 54(2): 427±432. doi: 10.2307/1934352 8. Jost L. Entropy and diversity. Oikos. 2006; 113(2):363±375. doi: 10.1111/j.2006.0030-1299.14714.x 9. Havrda J, Charva  t F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika. 1967; 3(1):30±35. 10. Daro  czy Z. Generalized information functions. Information and Control. 1970; 16(1):36±51. doi: 10. 1016/S0019-9958(70)80040-7 11. Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics. 1988; 52(1):479±487. doi: 10.1007/BF01016429 12. Hurlbert SH. The Nonconcept of Species Diversity: A Critique and Alternative Parameters. Ecology. 1971; 52(4):577±586. doi: 10.2307/1934145 13. Tothmeresz B. Comparison of different methods for diversity ordering. Journal of Vegetation Science. 1995; 6(2):283±290. doi: 10.2307/3236223 14. Marcon E, Scotti I, He  rault B, Rossi V, Lang G. Generalization of the Partitioning of Shannon Diversity. Plos One. 2014; 9(3):e90289. doi: 10.1371/journal.pone.0090289 PMID: 24603966 15. Chao A, Jost L. Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution. 2015; 6(8):873±882. doi: 10.1111/2041-210X.12349 16. Marcon E. Practical Estimation of Diversity from Abundance Data. HAL. 2015;01212435(version 2). 17. Zhang Z, Zhou J. Re-parameterization of multinomial distributions and diversity indices. Journal of Sta- tistical Planning and Inference. 2010; 140(7):1731±1738. doi: 10.1016/j.jspi.2009.12.023 18. Zhang Z, Grabchak M. Entropic Representation and Estimation of Diversity Indices. Journal of Non- parametric Statistics. 2016; 28(3):563±575. doi: 10.1080/10485252.2016.1190357 19. Faddeev DK. On the concept of entropy of a finite probabilistic scheme. Uspekhi Mat Nauk. 1956; 1(67):227±231. 20. Gregorius HR. Partitioning of diversity: the ªwithin communitiesº component. Web Ecology. 2014; 14:51±60. doi: 10.5194/we-14-51-2014 21. Dalton H. The measurement of the inequality of incomes. The Economic Journal. 1920; 30(119): 348±361. doi: 10.2307/2223525 22. Marcon E, He  rault B. entropart, an R Package to Partition Diversity. Journal of Statistical Software. 2015; 67(8):1±26. doi: 10.18637/jss.v067.i08 23. Corona P, Franceschi S, Pisani C, Portoghesi L, Mattioli W, Fattorini L. Inference on diversity from for- est inventories: a review. Biodiversity and Conservation. 2015;in press. 24. Cao L, Grabchak M. EntropyEstimation: Estimation of Entropy and Related Quantities; 2014. Available from: http://cran.r-project.org/package=EntropyEstimation. 25. R Development Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: http://www.r-project.org. 26. Gregorius HR. On the concept of effective number. Theoretical population biology. 1991; 40(2):269±83. doi: 10.1016/0040-5809(91)90056-L PMID: 1788824 27. Gourlet-Fleury S, Guehl JM, Laroussinie O. Ecology & Management of a Neotropical Rainforest. Les- sons Drawnfrom Paracou, a Long-Term Experimental Research Site in French Guiana. Paris, France: Elsevier; 2004. 28. Burnham KP, Overton WS. Robust Estimation of Population Size When Capture Probabilities Vary Among Animals. Ecology. 1979; 60(5):927±936. doi: 10.2307/1936861 29. Brose U, Martinez ND, Williams RJ. Estimating species richness: Sensitivity to sample coverage and insensitivity to spatial patterns. Ecology. 2003; 84(9):2364±2377. doi: 10.1890/02-0558 30. Olszewski TD. A unified mathematical framework for the measurement of richness and evenness within and among multiple communities Oikos. 2004; 104(2):377±387. 31. Chao A, Wang YT, Jost L. Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods in Ecology and Evolution. 2013; 4(11):1091±1100. doi: 10. 1111/2041-210X.12108 32. Dauby G, Hardy OJ. Sampled-based estimation of diversity sensu stricto by transforming Hurlbert diver- sities into effective number of species. Ecography. 2012; 35(7):661±672. doi: 10.1111/j.1600-0587. 2011.06860.x PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 11 / 11 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png PLoS ONE Pubmed Central

The generalized Simpson’s entropy is a measure of biodiversity

PLoS ONE , Volume 12 (3) – Mar 7, 2017

Loading next page...
 
/lp/pubmed-central/the-generalized-simpson-s-entropy-is-a-measure-of-biodiversity-lQ0Efshvi9

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Pubmed Central
ISSN
1932-6203
eISSN
1932-6203
DOI
10.1371/journal.pone.0173305
Publisher site
See Article on Publisher Site

Abstract

a1111111111 a1111111111 Modern measures of diversity satisfy reasonable axioms, are parameterized to produce a1111111111 diversity profiles, can be expressed as an effective number of species to simplify their inter- a1111111111 pretation, and come with estimators that allow one to apply them to real-world data. We introduce the generalized Simpson's entropy as a measure of diversity and investigate its properties. We show that it has many useful features and can be used as a measure of biodi- versity. Moreover, unlike most commonly used diversity indices, it has unbiased estimators, OPENACCESS which allow for sound estimation of the diversity of poorly sampled, rich communities. Citation: Grabchak M, Marcon E, Lang G, Zhang Z (2017) The generalized Simpson's entropy is a measure of biodiversity. PLoS ONE 12(3): e0173305. doi:10.1371/journal.pone.0173305 Editor: Stefan J. Green, University of Illinois at Chicago, UNITED STATES Introduction Received: November 3, 2016 Many indices of biodiversity have been proposed based on different definitions of diversity and different visions of the biological aspects to address [1]. Indeed, measuring diversity Accepted: February 17, 2017 requires both a robust theoretical framework [2] and empirical techniques to effectively esti- Published: March 7, 2017 mate it [3]. We focus on species-neutral diversity, i.e. the diversity of the distribution of spe- Copyright: This is an open access article, free of all cies, ignoring their features. Such measures only make sense when applied to a single copyright, and may be freely reproduced, taxocene, i.e. a subset of species in the community under study that belong to the same taxon distributed, transmitted, modified, built upon, or (e.g. butterflies) or, more loosely, to a meaningful group (e.g. trees). Classical measures of this otherwise used by anyone for any lawful purpose. type include richness (the number of species), Shannon's entropy [4], and Simpson's index [5]. The work is made available under the Creative Since one index is generally insufficient to fully capture the diversity of a community, mod- Commons CC0 public domain dedication. ern measures of diversity are parameterizable, allowing the user to give more or less relative Data Availability Statement: Data are available importance to rare versus frequent species [6]. Further, they can be expressed as an effective from the entropart package for R, available on number of species [7], which allows for an easy interpretation of their values [8]. Among the CRAN: https://cran.r-project.org/web/packages/ entropart/index.html. most popular indices of this type are HCDT entropy [9±11] (which includes richness, Simp- son's index, and Shannon's entropy as special cases), Re Ânyi's entropy [6], and the less-used Funding: This work has benefited from an Hurlbert's index [12]. These indices can be used to estimate the diversity of a community and ªInvestissement d'Avenirº grant managed by Agence Nationale de la Recherche (CEBA, ref. ANR- then to plot their values against the parameter, which controls the weight of rare species, to 10-LABX-25-01). obtain a diversity profile [7]. The profiles of two communities can be compared to provide a partial order of their diversity. If the profiles do not cross, one community can be declared to Competing interests: The authors have declared that no competing interests exist. be more diverse than the other [13]. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 1 / 11 The generalized Simpson's entropy is a measure of biodiversity HCDT entropy has many desirable properties [8, 14] but, despite recent progress [15], it cannot be accurately estimated when the communities are insufficiently sampled [16]. Re Ânyi's entropy is related to HCDT entropy by a straightforward transformation: the natural loga- rithm of the deformed exponential [14]. Its properties are very similar and, hence, it will not be treated here. Hurlbert's index has a simple and practical interpretation and can be estimated with no bias, but only up to when its parameter is strictly less than the sample size. We introduce generalized Simpson's entropy as a measure of diversity for its particular per- formance when it is used to estimate the diversity of small samples from hyper-diverse com- munities. The generalized Simpson's entropy z is parameterized: increasing its parameter r gives more relative importance to rare species. It has a simple interpretation, specifically, in a species accumulation curve, z is the probability that the individual sampled at rank r + 1 belongs to a new species. We show that z is a valid measure of diversity, satisfying the axioms established in the literature [2, 6]. We then show how to estimate z with no bias and how to construct confidence intervals, which can be used to compare the diversities of different com- munities. After this, we derive a simple formula for the corresponding effective number of spe- cies and discuss its estimation. Finally, we compare it to HCDT entropy and Hurlbert's index on a real-world example of under-sampled tropical forest to illustrate its decisive advantage when applied to this type of data. 1 Methods 1.1 Generalized Simpson's entropy Let ℓ , ℓ , . . ., ℓ be the species in a community, and let p be the proportion of individuals 1 2 S s belonging to species ℓ . Necessarily, 0 p  1 and p ˆ 1. We can interpret p as the s s s s sˆ1 probability of seeing an individual of species ℓ when sampling one individual from this com- munity. Generalized Simpson's entropy is a family of diversity indices defined by z ˆ p…1 p† ; r ˆ 1; 2; . . . : …1† r s s sˆ1 The parameter r is called the order of z . Note that, as r increases, z gives more relative weight r r to rare species than to more common ones. Note further that 0 z  1. In fact, z is the proba- r r bility that the (r + 1)st observation will be of a species that has not been observed before. Generalized Simpson's entropy was introduced as part of a larger class in [17] and was fur- ther studied in [18]. The name comes from the fact that 1 − z corresponds to Simpson's index as defined in [5]. A major advantage to working with this family is that there exists an unbiased estimator of z whenever r is strictly less than the sample size. While a similar result holds for Hurlbert's index, this is not the case with most popular diversity indices including HCDT entropy and Re Ânyi's entropy, which do not have unbiased estimators. We now turn to the question of when and why generalized Simpson's entropy is a good measure of diversity. 1.2 Axioms for a measure of diversity Historically, measures of diversity have been defined as functions mapping the proportions p , p , . . ., p into the real line, and satisfying certain axioms. We write H(p , p , . . ., p ) to 1 2 S 1 2 S denote a generic function of this type. We begin with three of the most commonly assumed axioms. The first two are from Re Ânyi [6] after Faddeev [19]. Axiom 1 (Symmetry) H(p , p , . . ., p ) must be a symmetric function of its variables. 1 2 S This means that no species can have a particular role in the measure. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 2 / 11 The generalized Simpson's entropy is a measure of biodiversity Axiom 2 (Continuity) H(p , p , . . ., p ) must be a continuous function of the vector 1 2 S (p , p , . . ., p ). 1 2 S This ensures that a small change in probabilities yields a small change in the measure. In particular, two communities differing by a species with a probability very close to 0 have almost the same diversity. Axiom 3 (Evenness) For a fixed number of species S, the maximum diversity is achieved when all species probabilities are equal, i.e., H…p ; p ; . . . ; p †  H…1=S; 1=S; . . . ; 1=S†: …2† 1 2 S This axiom was called evenness by Gregorius [20]. It means that the most diverse commu- nity of S species is the one where all species have the same proportions. We will give a more restrictive version of this axiom. Toward this end, following Patil and Taillie [2], we define a transfer of probability. This is an operation that consists of taking two species with p < p and modifying these probabilities to increase p by h > 0 and decrease p s t s t by h, such that we still have p + h p − h. In other words, some individuals of a more com- s t mon species are replaced by ones of a less common species, but in such a way that the order of the two species does not change. Axiom 4 (Principle of transfers) Any transfer of probability must increase diversity. The principle of transfers comes from the literature of inequality [21]. It is clear that this axiom is stronger than the axiom of evenness: if any transfer increases diversity, then, necessar- ily, the maximum value is reached when no more transfer is possible, i.e. when all proportions are equal. Generalized Simpson's entropy belongs to an important class of diversity indices, which are called trace-form entropies in statistical physics and dichotomous diversity indices in [2]. This class consists of indices of the form H…p ; p ; . . . ; p † ˆ p I…p†, where I(p) is called the 1 2 S sˆ1 s s information function. Indices of this type were studied extensively in [2] and [20]. I(p) defines the amount of information [4], or uncertainty [6], or surprise [22]. All of these terms can be taken as synonyms; they get at the idea that I(p) measures the rarity of individuals from a spe- cies with proportion p [2]. This discussion leads to the following axiom. Axiom 5 (Decreasing information) I(p) must be a decreasing function of p on the interval (0, 1] and I(1) = 0. This can be interpreted to mean that observing an individual from an abundant species brings less information than observing one from a rare species, and if an individual is observed from a species that has probability 1, then this observation brings no information at all. Patil and Taillie [2] showed that Axiom 5 ensures that adding a new species increases diver- sity. They also showed that both the principle of transfers and the axiom of decreasing infor- mation are satisfied if the function g(p) = pI(p) is concave on the interval [0, 1]. However, for generalized Simpson's entropy, g…p† ˆ p…1 p† ; p 2 ‰0; 1Š …3† is not a concave function of p if r > 1. In fact, for r > 1 generalized Simpson's entropy does not satisfy the principle of transfers. For this reason Gregorius [20], in a study of many different entropies, did not retain it. However, we will show that generalized Simpson's entropies satisfy a weaker version of the principle of transfers, and are, nevertheless, useful measures of diversity. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 3 / 11 The generalized Simpson's entropy is a measure of biodiversity 1.3 The generalized Simpson's entropy is a measure of diversity It is easy to see that generalized Simpson's entropy always satisfies Axioms 1, 2 and 5, but, as we have discussed, it does not satisfy Axiom 4. However, we will show that it satisfies a weak version of it and that it satisfies Axiom 3 for a limited, but wide range of orders r. Axiom 6 (Weak principle of transfers) Any transfer of probability must increase diversity as long as the sum of the probabilities of the concerned species is below a certain threshold, i.e., the principle of transfers holds so long as p ‡ p  T for some 0 < T  1: …4† s t We now give our results about the properties of generalized Simpson's entropy. The proofs are in S1 Appendix. Proposition 1 Generalized Simpson’s entropy of order r respects the weak principle of transfers with T ˆ . r‡1 Proposition 2 Generalized Simpson’s entropy of order r respects the evenness axiom if r S − 1. In light of Proposition 2, we will limit the order to r = 1, 2, . . ., (S − 1). In this case, general- ized Simpson's entropy satisfies Axioms 1±3, and can be regarded as a measure of diversity. 2 2 Moreover, it satisfies Axiom 5 and the weak principle of transfers up to T ˆ  . Thus, a r‡1 S transfer of probability increases diversity, except between very abundant species. 1.4 Estimation In practice, the proportions, (p , p , . . ., p ), are unknown and, hence, the value of generalized 1 2 S Simpson's entropy as well as any other diversity index is unknown and can only be estimated from data. For this purpose, assume that we have a random sample of n individuals from a given community. The assumption that we have a random sample, i.e. that the observations are independent and identically distributed, may be unrealistic in some situations. However, most estimators rely on this assumption, and appropriate sampling design is the simplest solu- tion to obtain independent and identically distributed data. See [23] for a review of these issues in the context of forestry. In principle, the assumption of a random sample implies that either the population is infinite, or that the sampling is done with replacement. In practice, the popu- lation is finite and sampling in ecological studies is usually performed without replacement. However, when the sample size is much smaller than the population, the dependence intro- duced by sampling from a finite population without replacement is negligible and can be ignored. Let n be the number of individuals sampled from species ℓ , and note that n ˆ n . We s s sˆ1 s can estimate p by p ˆ n =n. A naive estimator of z is given by the so-called ªplug-inº estima- s r s s S r ^ ^ tor p …1 p † . Unfortunately, this may have quite a bit of bias. However, for 1 r sˆ1 s s (n − 1), an unbiased estimator of z exists and is given by S r 1 r‡1 X Y n ‰n r 1Š! j Z ˆ p ^ 1 p ^ ; …5† r s s n! n sˆ1 jˆ0 see [17]. There it is shown that Z is a uniformly minimum variance unbiased estimator (umvue) for z when 1 r (n − 1). Note that the sum in Eq (5) ranges over all of the species in the community. This may appear impractical since we generally do not know the value of S. However, for any species ℓ that is not observed in our sample, we have p ^ ˆ 0, and we do not need to include it in the sum. Assume that we have observed K S different species in the sample and that these species PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 4 / 11 The generalized Simpson's entropy is a measure of biodiversity 0 0 0 0 are ` ;` ; . . . ; ` . For each s = 1, 2, . . ., K, let n be the number of individuals from species ` 1 2 K s s 0 0 sampled, and let p ^ ˆ n =n be the estimated proportion of species ` . In this case we can write s s s K r 1 r‡1 X Y n ‰n r 1Š! j 0 0 ^ ^ Z ˆ p 1 p : …6† s s n! n sˆ1 jˆ0 With a few simple algebraic steps, we can rewrite this in the form K r X Y n 1 0 s Z ˆ p 1 ; …7† r s n j sˆ1 jˆ1 which we have found to be more tractable for computational purposes. In [17] and [18] it is shown that Z is consistent and asymptotically normal. These facts can be used to construct asymptotic confidence intervals. First, define the (K − 1) × (K − 1) dimen- sional matrix given by 0 1 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ p …1 p † p p  p p 1 1 1 2 1 K 1 B C B C 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ B p p p …1 p †  p p C 2 1 2 2 2 K 1 B C S ˆ …8† B C B C B C @ A 0 0 0 0 0 0 ^ ^ ^ ^ ^ p p p p  p …1 p † K 1 1 K 1 2 K 1 K 1 and the (K − 1) dimensional column vector h , where for each j = 1, . . ., (K − 1) the jth compo- nent of h is given by r r 1 r r 1 0 0 0 0 0 0 ^ ^ ^ ^ ^ ^ …9† 1 p ‡ rp 1 p 1 p rp 1 p : j j j K K K When there exists at least one s with p 6ˆ 1/S (i.e. we do not have a uniform distribution) then an asymptotic (1 −α)100% confidence interval for z is given by s ^ Z  z p ; …10† r a=2 where q ^T ^ ^ …11† s ^ ˆ h Sh r r ^T ^ is the estimated standard deviation, h is the transpose of h , and z is a number satisfying α/2 r r P(Z > z ) =α/2 where Z * N(0, 1) is a standard normal random variable. Methods for evalu- α/2 ating Z and s ^ are available in the package EntropyEstimation [24] for R [25]. For details about the confidence interval see S1 Appendix. 1.5 Comparing distributions In many situations it is important not only to estimate the diversity of one community, but to compare the diversities of two different communities. Toward this end, we discuss the con- struction of confidence intervals for the difference between the generalized Simpson's entro- pies of two communities. …1† …2† Fix an order r and let z and z be the generalized Simpson's entropies of the first and sec- r r ond community respectively. To estimate these, assume that we have a random sample of size PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 5 / 11 The generalized Simpson's entropy is a measure of biodiversity n from the first community and a random sample of size n from the second community. 1 2 Assume further that these two samples are independent of each other and that r (min{n , n } − 1), where min{n , n } is the minimum of n and n . If both communities satisfy the con- 2 1 2 1 2 ditions given in Section 1.4, an asymptotic (1 −α)100% confidence interval for the difference …1† …2† z z is given by r r s 2 2 …1† …2† s ^ s ^ r r …1† …2† …12† Z Z  z ‡ ; a=2 r r n n 1 2 …1† …2† …1† …2† …1† …2† where Z and Z are the estimates of z and z and s ^ and s ^ are the estimated standard r r r r r r deviations as in Eq (11). In practice, it is often not enough to look at only one diversity index. For this reason we may want to look at an entire profile of generalized Simpson's entropies. This can be done as follows. Fix any positive integer v (min{n , n } − 1). In order for z to be a reasonable diver- 1 2 v …1† …2† sity estimator, we also require v (S − 1). For each r = 1, 2, . . ., v we can estimate Z , Z , r r and the corresponding confidence interval. Looking at these for all values of r gives a pointwise confidence envelope. We can now see if the two communities have statistically significant dif- ferences in the amount of diversity by seeing if zero is in the envelope or not. If it is generally in the envelope then the differences are not significant, and if it is generally outside of the enve- lope then the differences are significant. 1.6 Effective number of species The effective number of species [7] is the number of equiprobable species that would yield the same diversity as a given distribution [26]. It is a measure of diversity sensu sticto [8]. We will r z write entropy for z and diversity for its effective number, which we denote by D . To derive r z D we assume r z D r 1 1 z ˆ 1 ; …13† r z r z D D sˆ1 and then simple algebra yields r z D ˆ : …14† 1 z r z Note that Eq (13) assumes that D is an integer, while in Eq (14) it is generally not an integer. This is not an issue because Eq (13) is just a formalism used to derive Eq (14). A more devel- oped argumentation can be found in Appendix B of [20]. 1/r Since the function f(t) = 1/(1 − t ), t2 [0, 1] is monotonically increasing, we can transform r z confidence intervals for z into confidence intervals for D as follows. If (L, U) is a (1 −α) r z 100% confidence interval for z then (f(L), f(U)) is a (1 −α)100% confidence interval for D . It r z is important to note that any inference based on such confidence intervals for D is equivalent to inference based on the original confidence interval for z . 2 Example data and results In this section we apply our methodology to estimate and compare the diversities of two 1-ha plots (#6 and #18) of tropical forest in the experimental forest of Paracou, French Guiana [27]. Respectively 641 and 483 trees with diameter at breast height over 10 cm were inventoried. The data is available in the entropart package for R. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 6 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 1. Generalized Simpson's entropy and diversity profiles. (a) entropy and (b) diversity profiles of Paracou plots 6 (solid, green lines) and 18 (dotted, red lines). The bold lines represent the estimated values, surrounded by their 95% confidence envelopes. doi:10.1371/journal.pone.0173305.g001 In the data, we observe 147 and 149 species from plots 6 and 18 respectively. However, spe- cies may not have been sampled and we must adjust these values. Jackknives tend to be good estimators of richness, see [28]. We use a jackknife of order 2 for plot 6 and one of order 3 for plot 18: the choice of the optimal order follows both [28] and [29]. The estimated richness is, respectively, 254 and 309 species. For this reason we estimate generalized Simpson's entropy up to order r = 253. This, along with a 95% confidence envelope is given in Fig 1a. The generalized Simpson's diversity profiles along with a 95% confidence envelope are given in Fig 1b. These give more intuitive information since they represent the effective num- bers of species. Their values at r = 1 are given, respectively, by 39 and 46 species. Increasing val- ues of r give more importance to rare species, which leads to the increase in the effective number of species seen in the graph. Plot 18 is clearly more diverse than plot 6, with a fairly stable difference of between 15 and 19 effective species. In Fig 2 the difference between the entropies is plotted with its 95% confi- dence envelope to test it against the null hypothesis of zero difference. Since zero is never in this envelope, we conclude that plot 18 is significantly more diverse than plot 6. 3 Discussion 3.1 Interpretation Generalized Simpson's entropy of order r can be interpreted as the average information brought by the observation of an individual. Its information function I(p) = (1 − p) represents the probability of not observing a single individual of a species with proportion p in a sample of size r. Thus I is an intuitive measure of rarity. Olszewski [30] (see also [31]) interpreted z as the probability that the individual sampled at rank (r + 1) belongs to a previously unobserved species in a species accumulation curve, i.e. the slope of the curve at rank (r + 1). A related interpretation is as follows. If X is the number of species observed exactly once in a sample of size (r + 1), then z = E[X]/(r + 1). These interpretations are not limited to orders r < S. However, when r S, z is no longer a reasonable measure of diversity. In particular, in this case, it may not be maximized at the uni- r z form distribution, which could lead the effective number of species, D , to be greater than the actual number of species. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 7 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 2. Difference between the generalized Simpson's entropy of plots 6 and 18 with their 95% confidence envelope. The horizontal dotted line represents the null hypothesis of identical diversity. Since it is always outside of the confidence envelope, identical diversity is rejected. doi:10.1371/journal.pone.0173305.g002 3.2 HCDT entropy In this section we compare our results to those based on the more standard HCDT entropy, which is given by p 1 q sˆ1 s …15† T ˆ ; q  0; 1 q where, for q = 1, this is interpreted by its limiting value as T ˆ p logp . The effective sˆ1 s s number of species for HCDT entropy was derived in [7]. It is given by 1=…1 q† q T q …16† D ˆ p ; q  0; sˆ1 q T T where, for q = 1, this is interpreted by its limiting value as D ˆ e . We call this quantity HCDT diversity, although in the literature it is often called Hill's diversity number. For our q T data, plots of D for q2 [0, 2] along with a 95% confidence envelope are given in Fig 3a. Here q T D was estimated using the jackknife-unveiled estimator of [16] and the confidence envelope was estimated using bootstrap. It is easy to see that the importance of rare species increases for HCDT entropy as q decreases. For comparison, the importance of rare species for generalized Simpson's entropy increases as r increases. Note that T = z . To see what values of q in HCDT entropy correspond r z q T to other values of r for generalized Simpson's entropy, we can find when D = D . Since we can only use z up to r = S − 1 it is of interest to find which value of q corresponds to this value. For our data we find that in plot 6 q = 0.5 corresponds to r = 253 and in plot 18 q = 0.55 corre- sponds to r = 308. The main difficulty in working with HCDT entropy is that its estimators have quite a lot of bias, especially for smaller values of q [16]. This is illustrated in Fig 3a, where we see that the PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 8 / 11 The generalized Simpson's entropy is a measure of biodiversity Fig 3. (a) HCDT and (b) Hurlbert's diversity profiles of Paracou plots 6 (solid, green lines) and 18 (dotted, red lines). The bold lines represent the estimated values, surrounded by their 95% confidence envelope (obtained by 1000 bootstraps). doi:10.1371/journal.pone.0173305.g003 confidence intervals of the estimated values of the HCDT diversity of plots 6 and 18 have sig- nificant overlap up to q = 0.75. Bias is not an issue with generalized Simpson's entropy, which can be estimated with no bias, regardless of the sample size (although its precision does depend on the sample size, see Eq (10)). The main issue with generalized Simpson's entropy is that it can only be considered for orders r S − 1, and larger values of r correspond to smaller values of q for HCDT entropy. In our example, the generalized Simpson's diversity profile can be compared to the part of the HCDT diversity profile between q = 0.5 and q = 2. Focusing more on rare species is not possi- ble. HCDT diversity allows that theoretically, but is seriously limited by its estimation issues: the profile has a wide confidence envelope and is not conclusive below q = 0.75. On the whole, generalized Simpson's entropy allows for a more comprehensive comparison of diversity profiles. If richness were greater, higher orders of generalized Simpson's diversity could be used and estimated with no bias, while low-order HCDT estimation would get more uncertain [16]. 3.3 Hurlbert's diversity Another measure of diversity, which is related to generalized Simpson's entropy, was intro- duced in [12]. It is given by h i H ˆ 1 …1 p† ; k ˆ 1; 2; . . . ; …17† sˆ1 and corresponds to the expected number of species found in a sample of size k. It is easily veri- fied that H = 1 + z , and that the higher the value of k, the greater the importance given to rare species. While there is no simple formula for the corresponding effective number of spe- cies, an iterative procedure for finding it was developed in [32]. Hurlbert [12] developed an unbiased estimator of H for all k smaller than the sample size. This is similar to what is needed to estimate generalized Simpson's entropy, although, general- ized Simpson's entropy also needs r < S for it to be a measure of diversity. We estimate Hurl- bert's index for the two plots, convert them into effective numbers of species, and use bootstrap to get a 95% confidence envelope. The results are given in Fig 3b. We see that the maximum effective numbers of species are well below those of the generalized Simpson's diversity. Thus Hurlbert's diversity finds fewer rare species, making it a less interesting alterna- tive for our purpose. PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 9 / 11 The generalized Simpson's entropy is a measure of biodiversity 4 Conclusion Generalized Simpson's entropy is a measure of diversity respecting the classical axioms when r < S and has a simple formula to transform it into an effective number of species. It faces sev- eral issues that limit its use. Specifically, it only makes sense when applied to a single taxocene and its estimator has nice properties only under the assumption of random sampling. How- ever, these issues are shared with all of the other measures of diversity discussed here and many, if not most, of the ones available in the literature. Further, generalized Simpson's entropy has a decisive advantage over other such measures: it has an easy-to-calculate uni- formly minimum variance unbiased estimator, which is consistent and asymptotically normal. These properties make it a useful tool for estimating diversity and for comparing hyper- diverse, poorly sampled communities. R code to reproduce the examples in the paper, based on the packages EntropyEstimation and entropart [22], is given in S2 Appendix. All data are available in the entropart package. Supporting information S1 Appendix. Proofs. (PDF) S2 Appendix. R code. This code allows for the reproduction of all examples and figures in this article. (PDF) Author Contributions Conceptualization: ZZ MG EM. Data curation: EM MG. Formal analysis: MG EM GL ZZ. Investigation: MG EM GL ZZ. Methodology: MG EM GL ZZ. Software: MG EM. Supervision: ZZ. Validation: MG EM GL ZZ. Visualization: MG EM. Writing ± original draft: MG EM. References 1. Ricotta C. Through the jungle of biological diversity. Acta Biotheoretica. 2005; 53(1):29±38. doi: 10. 1007/s10441-005-7001-6 PMID: 15906141 2. Patil GP, Taillie C. Diversity as a concept and its measurement. Journal of the American Statistical Association. 1982; 77(379):548±561. doi: 10.2307/2287712 3. Beck J, Schwanghart W. Comparing measures of species diversity from incomplete inventories: an update. Methods in Ecology and Evolution. 2010; 1(1):38±44. doi: 10.1111/j.2041-210X.2009.00003.x 4. Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948; 27:379±423, 623±656. doi: 10.1002/j.1538-7305.1948.tb01338.x 5. Simpson EH. Measurement of diversity. Nature. 1949; 163(4148):688. doi: 10.1038/163688a0 PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 10 / 11 The generalized Simpson's entropy is a measure of biodiversity 6. Re  nyi A. On Measures of Entropy and Information. In: Neyman J, editor. 4th Berkeley Symposium on Mathematical Statistics and Probability. vol. 1. Berkeley, USA: University of California Press; 1961. p. 547±561. 7. Hill MO. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology. 1973; 54(2): 427±432. doi: 10.2307/1934352 8. Jost L. Entropy and diversity. Oikos. 2006; 113(2):363±375. doi: 10.1111/j.2006.0030-1299.14714.x 9. Havrda J, Charva  t F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika. 1967; 3(1):30±35. 10. Daro  czy Z. Generalized information functions. Information and Control. 1970; 16(1):36±51. doi: 10. 1016/S0019-9958(70)80040-7 11. Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics. 1988; 52(1):479±487. doi: 10.1007/BF01016429 12. Hurlbert SH. The Nonconcept of Species Diversity: A Critique and Alternative Parameters. Ecology. 1971; 52(4):577±586. doi: 10.2307/1934145 13. Tothmeresz B. Comparison of different methods for diversity ordering. Journal of Vegetation Science. 1995; 6(2):283±290. doi: 10.2307/3236223 14. Marcon E, Scotti I, He  rault B, Rossi V, Lang G. Generalization of the Partitioning of Shannon Diversity. Plos One. 2014; 9(3):e90289. doi: 10.1371/journal.pone.0090289 PMID: 24603966 15. Chao A, Jost L. Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution. 2015; 6(8):873±882. doi: 10.1111/2041-210X.12349 16. Marcon E. Practical Estimation of Diversity from Abundance Data. HAL. 2015;01212435(version 2). 17. Zhang Z, Zhou J. Re-parameterization of multinomial distributions and diversity indices. Journal of Sta- tistical Planning and Inference. 2010; 140(7):1731±1738. doi: 10.1016/j.jspi.2009.12.023 18. Zhang Z, Grabchak M. Entropic Representation and Estimation of Diversity Indices. Journal of Non- parametric Statistics. 2016; 28(3):563±575. doi: 10.1080/10485252.2016.1190357 19. Faddeev DK. On the concept of entropy of a finite probabilistic scheme. Uspekhi Mat Nauk. 1956; 1(67):227±231. 20. Gregorius HR. Partitioning of diversity: the ªwithin communitiesº component. Web Ecology. 2014; 14:51±60. doi: 10.5194/we-14-51-2014 21. Dalton H. The measurement of the inequality of incomes. The Economic Journal. 1920; 30(119): 348±361. doi: 10.2307/2223525 22. Marcon E, He  rault B. entropart, an R Package to Partition Diversity. Journal of Statistical Software. 2015; 67(8):1±26. doi: 10.18637/jss.v067.i08 23. Corona P, Franceschi S, Pisani C, Portoghesi L, Mattioli W, Fattorini L. Inference on diversity from for- est inventories: a review. Biodiversity and Conservation. 2015;in press. 24. Cao L, Grabchak M. EntropyEstimation: Estimation of Entropy and Related Quantities; 2014. Available from: http://cran.r-project.org/package=EntropyEstimation. 25. R Development Core Team. R: A Language and Environment for Statistical Computing; 2016. Available from: http://www.r-project.org. 26. Gregorius HR. On the concept of effective number. Theoretical population biology. 1991; 40(2):269±83. doi: 10.1016/0040-5809(91)90056-L PMID: 1788824 27. Gourlet-Fleury S, Guehl JM, Laroussinie O. Ecology & Management of a Neotropical Rainforest. Les- sons Drawnfrom Paracou, a Long-Term Experimental Research Site in French Guiana. Paris, France: Elsevier; 2004. 28. Burnham KP, Overton WS. Robust Estimation of Population Size When Capture Probabilities Vary Among Animals. Ecology. 1979; 60(5):927±936. doi: 10.2307/1936861 29. Brose U, Martinez ND, Williams RJ. Estimating species richness: Sensitivity to sample coverage and insensitivity to spatial patterns. Ecology. 2003; 84(9):2364±2377. doi: 10.1890/02-0558 30. Olszewski TD. A unified mathematical framework for the measurement of richness and evenness within and among multiple communities Oikos. 2004; 104(2):377±387. 31. Chao A, Wang YT, Jost L. Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods in Ecology and Evolution. 2013; 4(11):1091±1100. doi: 10. 1111/2041-210X.12108 32. Dauby G, Hardy OJ. Sampled-based estimation of diversity sensu stricto by transforming Hurlbert diver- sities into effective number of species. Ecography. 2012; 35(7):661±672. doi: 10.1111/j.1600-0587. 2011.06860.x PLOS ONE | DOI:10.1371/journal.pone.0173305 March 7, 2017 11 / 11

Journal

PLoS ONEPubmed Central

Published: Mar 7, 2017

References