Access the full text.
Sign up today, get DeepDyve free for 14 days.
Objective In mathematical phylogenetics, a labeled rooted binary tree topology can possess any of a number of labeled histories, each of which represents a possible temporal ordering of its coalescences. Labeled histories appear frequently in calculations that describe the combinatorics of phylogenetic trees. Here, we generalize the concept of labeled histories from rooted phylogenetic trees to rooted phylogenetic networks, specifically for the class of rooted phylogenetic networks known as rooted galled trees. Results Extending a recursive algorithm for enumerating the labeled histories of a labeled tree topology, we pre- sent a method to enumerate the labeled histories associated with a labeled rooted galled tree. The method relies on a recursive decomposition by which each gall in a galled tree possesses three or more descendant subtrees. We exhaustively provide the numbers of labeled histories for all small galled trees, finding that each gall reduces the number of labeled histories relative to a specified galled tree that does not contain it. Conclusion The results expand the set of structures for which labeled histories can be enumerated, extending a well- known calculation for phylogenetic trees to a class of phylogenetic networks. Keywords Galled trees, Labeled histories, Mathematical phylogenetics, Phylogenetic networks Labeled histories, sometimes also termed ordered trees Introduction [23] or coalescence sequences [20], have appeared in many Labeled histories represent a fundamental concept in types of studies. They arise in basic phylogenetic com - mathematical phylogenetics, tabulating sequences in binatorics, in which classes of phylogenetic trees are which the branching events that have given rise to a set enumerated and their features assessed [11, 24]. In coa- of labeled lineages could have taken place. Given a set lescent theory, which studies genetic lineages sampled of labeled lineages at the leaves of a rooted binary tree, in a population, the labeled histories, viewed backward many topological relationships are possible for those lin- in time from the present, describe the set of possible eages, each describing a labeled topology, each of which sequences in which the sampled gene lineages coalesce in turn is compatible with one or more labeled histories to a common ancestor [26]. Probability computations in (p. 47 of [24]). coalescent theory often consider a set of labeled histories that is compatible with a desired tree shape [16, 17, 19]. Labeled histories arise frequently in the combinatorics *Correspondence: of gene trees and species trees, in which labeled topolo- Noah A. Rosenberg gies for gene lineages sampled from a set of species are noahr@stanford.edu considered in relation to labeled topologies for the spe- Department of Biology, Stanford University, Stanford 94305, CA, USA cies themselves [5]. Algorithms that traverse tree spaces © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 2 of 13 in searching for labeled topologies that could underlie 2, and (iv) all edges are directed away from the root. molecular data also make use of labeled histories [13]. Nodes with in-degree 2 and out-degree 1 are termed Fundamental results on labeled histories include the reticulation nodes, and nodes with in-degree 1 and out- number of labeled histories possible for n labeled line- degree 2 are tree nodes. ages [6] and the number of labeled histories for a speci- Note that although, as a directed acyclic graph, a rooted fied labeled topology [3, 24] (see also problem 20 on p. 67 binary phylogenetic network has no directed cycles, if the of [15]). The labeled topologies that, for a specified num - sense of direction is removed, then the associated undi- ber of lineages, possess the largest number of labeled his- rected graph can possess cycles. If this undirected graph tories are also known [10]. has no cycles, then the network is simply a tree. The Recently, much attention in mathematical phylogenet- undirected graph of a galled tree does not contain nested ics has considered phylogenetic networks, generaliza- cycles (Fig. 1). tions of phylogenetic trees in which evolution has not A rooted galled tree is a rooted binary phylogenetic necessarily proceeded in a tree-like manner [12]. Because network in which (i) each reticulation node a has a biological phenomena such as admixture, horizontal gene unique ancestor node r such that exactly two nonoverlap- transfer, hybridization, and the genetic exchanges that ping paths of edges exist from r to a ; if the direction of occur via migration can induce non-tree-like evolution edges is ignored, then r, a , and these two paths form a for a set of biological groups, phylogenetic networks are cycle C , known as a gall. In addition, (ii) for reticulation increasingly relevant to a variety of biological problems. nodes a and a , the sets of nodes in the galls C associ- r s r Similar concepts to labeled histories can be defined for ated with a and C associated with a are disjoint. r s s networks, in particular, those networks that are meant to By convention, we refer to a galled tree as a tree, even represent evolution in time. Indeed, Bienvenu et al. [1] though in a technical sense, a galled tree with one or suggest the study of labeled histories for phylogenetic more galls is not a tree. In the literature on phylogenetic networks, focusing on tree–child networks. They pose networks, a galled network is distinct from a galled tree, the problem of enumerating the analogue of labeled his- so that this term is not available for galled trees. As all tories for networks: the problem of enumerating labeled trees and networks that we consider are rooted and phylogenetic networks whose internal nodes are placed binary, we usually omit these terms, understanding that in distinct temporal orders, or rankings, but that share they are implied. an unranked labeled structure in common (p. 656). Here, It is convenient to name various features associated we solve this enumeration for a class of phylogenetic with a gall in a galled tree (Fig. 2). First, all nodes that networks, namely the rooted labeled galled trees. Galled are not leaf nodes are internal nodes, including the trees, which first emerged from the study of ancestral recombination graphs [9, 22], represent a relatively sim- ple type of network structure, a subset of the tree–child A B networks. We first introduce precise notions of galled trees and labeled histories. Next, we perform the enumeration of labeled histories for an example labeled galled tree. The example is followed by the general computation of the number of labeled histories for an arbitrary labeled galled AB CD EF GH AB CD EF GH tree. We then use the general computation to exhaus- tively count labeled histories for all labeled galled trees CD with at most 6 leaves. We conclude with a discussion. Preliminaries Definitions Our focus is on rooted galled trees, a type of rooted binary phylogenetic network. Following Definition 1.1 AB CD EF GA BC DE F of Bienvenu et al. [1], we consider a rooted binary phy- Fig. 1 Galled trees. A A galled tree with two galls. B A galled tree logenetic network to be a directed acyclic graph such with the same labeled topology as A but with a different labeled that (i) there is a unique root node with in-degree 0 and history. C A network that is not a galled tree because it contains nested cycles. D A network that is not a galled tree because it has out-degree 2, (ii) all leaf nodes have in-degree 1 and out- cycles that share vertices. This network would be included in the class degree 0, (iii) non-leaf and non-root nodes have either in- of galled networks [8], which is distinct from the class of galled trees degree 2 and out-degree 1 or in-degree 1 and out-degree M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 3 of 13 the subtree descended from internal node i by T , and the number of internal nodes in a tree T by v(T). Note that we follow Bienvenu et al. [1] in only con- sidering networks to which it is possible to assign a chronological order of internal nodes in addition to a genealogical order. That is, supposing each node is asso - ciated with an instant in time, we disallow networks that involve such temporal impossibilities as a hybridization of node v with a child node of v occurring in a network 1 2 that also contains a hybridization of v with a child of v . 2 1 In the same manner that Bienvenu et al. [1] consider tree- child networks and ranked tree-child networks, we con- sider galled trees and ranked galled trees, where a ranked galled tree is a galled tree together with its labeled his- Fig. 2 Parts of a gall. These include the top node (12), left non-hybridizing side nodes (8), right non-hybridizing side nodes tory: the chronological order in which its branching—or (10, 11), left hybridizing side node (6), right hybridizing side node coalescence—and hybridization events take place. (7), and hybrid node (5). In this galled tree, leaf nodes (orange) are Given a set of labeled leaves of a phylogenetic tree or labeled with letters A–K. Internal nodes (black) are numbered using a network, a labeled topology is the structure that describes postorder traversal, with child nodes assigned smaller numbers than the topological relationship ancestral to the leaves. The parent nodes; at hybridization events, the subtree that receives the smallest numbers is the subtree descended from the hybrid node, labeled topology includes both coalescences and hybridi- and the hybrid node receives a smaller number than the hybridizing zation events. Thus, for example, the labeled topology of side nodes the galled tree in Fig. 2 is obtained by disregarding the temporal sequence of the internal nodes, so that only the connectivity of the nodes is considered. root. The internal nodes include the tree nodes and the We interpret galled trees with a sense of time proceed- reticulation nodes. For a gall with reticulation node a , ing from the root to the leaves, all of which are contem- ancestor node r, and cycle C , because we draw galled poraneous. With this interpretation, a labeled topology trees with the ancestors at the top of the diagram, with might permit several distinct orders in which its coales- descent proceeding from top to bottom, we refer to cence and hybridization events can occur. For a given the ancestor node r as the top node. The reticulation labeled topology, a labeled history is a specific order of its node a is termed a hybrid node. All nodes in the set coalescences and hybridizations. That is, a labeled history C of nodes in the gall, other than the top node and of a tree or network is the labeled topology of the tree or the hybrid node, are side nodes. Each gall has two side network together with the associated temporal sequence nodes that are special; these side nodes are the nodes of its internal nodes. For the example in Fig. 2, forward that are the immediate parents of the hybrid node; we in time, the labeled history places the internal nodes in call them hybridizing side nodes or simply hybridiz- the sequence 12, 8, 11, 10, 2, (5, 6, 7), 3, 4, 9, 1, where 5, 6, ing nodes. For visual clarity, we draw the bottom of a and 7 are contemporaneous. More generally, for our enu- gall as a horizontal line, representing the idea that the meration of labeled histories compatible with the labeled hybridizing nodes instantaneously hybridize to produce topology of a galled tree, we treat each hybrid node as the hybrid node. On this horizontal line, we always contemporaneous with its two parental nodes. place the hybrid node between the two hybridizing side Formally, consider a galled tree labeled topology with nodes. a node set V including n leaves, an edge set E, and a par- All side nodes other than the two hybridizing side tial order that describes ancestor–descendant rela- nodes are termed non-hybridizing side nodes. Each side tionships. In particular, two nodes v , v in V satisfy 1 2 node is a left side node or right side node. We use the v v if v lies on a path from the root node to v ; a pair 1 2 1 2 terms “left” and “right” for convenience, associating “left” of edges e , e in E can also satisfy e e if e lies on a 1 2 1 2 1 side nodes with the left side of a gall in drawings of galled path from the root node to e , and a node v and edge e trees and “right” side nodes with the right side; however, can also satisfy v e if v is ancestral to e, or e v if we regard a galled tree as invariant with respect to the e is ancestral to v. Trivially, a node or edge is ancestral exchange of left and right descendants of one or more to itself and descended from itself. Associate with each top nodes. The gall has subtrees associated with each side node v a time t(v), such that t(v ) ≤ t(v ) if v v . For 1 2 1 2 node, the hybridizing nodes, and the hybrid node. We v = v , we require t(v ) = t(v ) if {v , v } contains (i) a 1 2 1 2 1 2 denote the set of subtrees associated with the gall by T , hybrid node and one of its parental hybridizing nodes; Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 4 of 13 of labeled histories for a rooted binary tree with n leaves has been obtained both recursively and nonre- cursively. We will have occasion to use both the recur- sive and nonrecursive formulas, as our enumeration of galled trees follows the reasoning of the recursive approach, and the nonrecursive formula is convenient in steps that count labeled histories for non-galled sub- trees of a galled tree. The root of a binary tree has two subtrees. To obtain a labeled history for a full labeled binary tree, the inter- nal nodes of the two subtrees can be arranged in any order in relation to one another, maintaining the order within each subtree. The number of labeled histories of Fig. 3 An example galled tree. This tree has three galls. One gall has a tree is the product of the numbers of labeled histories a top node at the root (node 22), with two side nodes on the left and two on the right. A second gall is in the subtree T descended from L of the two subtrees and the number of ways in which the left hybridizing side node of the first gall (node 12); this second the internal nodes of the two subtrees can be interwo- gall has only one side node on the left and one side node on the ven once the subtree labeled histories are fixed. Hence, right (its hybridizing side nodes). A third gall is descended from the the recursive formula for the number of labeled histo- right non-hybridizing side node of the first gall (node 21) ries of a tree T whose subtrees T and T have v(T ) and ℓ r ℓ v(T ) internal nodes, respectively, is v(T ) + v(T ) ℓ r (ii) the two hybridizing nodes that are the parents of L (T ) = L (T ) L (T ) , H H ℓ H r (1) v(T ) the same hybrid node; or (iii) two leaves. Otherwise, r t(v ) = t(v ) . A labeled history is a sequence of sets of 1 2 where L (T ) = 1 if T has a single leaf or if T is a 2-leaf nodes W ,W , . . . ,W such that (i) for all i and all nodes 1 2 n tree [3, 11]. v , v ∈ W , t(v ) = t(v ) , and (ii) for all i, j with i < j i1 i2 i i1 i2 In nonrecursive form (Lemma 1 of [25]), the number of and all nodes v , v with v ∈ W and v ∈ W , t(v )< t(v ) . i j i i j j i j labeled histories is W contains only the root node, and W contains the 1 n leaves. Note that the number of sequences of sets W — (n − 1)! L (T ) = , (2) the number of distinct points in time occupied by the 0 v(T ) i∈V (T ) nodes of a galled tree—is equal to the number of leaves in the galled tree and does not depend on the number of where V (T ) is the set of internal nodes of T (including galls. the root) and T is the subtree descended from internal In the same way that the term “galled tree” abuses node i. the term tree, we also abuse the term subtree by allow- ing a “subtree” to contain galls. Technically, a “subtree” that contains galls is not a tree, but it is convenient to Example think of it as tree-like. Hence, each internal node of To count labeled histories of galled trees, we use a recur- a galled tree has a subtree to which it is ancestral; for sive approach that generalizes the recursive count for a hybridizing node that has two child nodes, one of labeled histories of a tree without galls. Informally, we which is a hybrid node, that hybridizing node is imme- can view a galled tree as a network that is structurally diately ancestral to the root of a subtree that includes similar to a true tree. In particular, in a gall, side nodes the child node that is not the hybrid node. When refer- and the hybrid node each give rise to descendant sub- ring to subtrees “descended” from an internal node, we trees, which might themselves include galls. Note that if are describing the subtrees rooted at children of the such a subtree includes galls, then it is more accurately node. A non-hybridizing side node has exactly one such termed a subnetwork; for convenience, we continue to subtree, rooted at one of its child nodes; the other child call it a subtree. node is part of its associated gall. We begin from the root node of the galled tree. If the root is not the top node of a gall, then we proceed toward its child nodes as in the recursive enumeration of labeled Labeled histories for trees histories for trees. We count labeled histories for the left We recall results concerning the enumeration of subtree and for the right subtree, and we count ways that labeled histories for trees (without galls). The number M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 5 of 13 these labeled histories can be interwoven in relation to node (node 21), the next is below the next side node one another. (node 14), and a final period lies after the hybridization If, instead, the root node is a top node of a gall, then (nodes 11–13). The internal nodes in the subtree of a side we introduce a new recursive function that enumerates node can only be placed in the periods subsequent to labeled histories for the subtrees of all side nodes of the the side node itself, so the number of time periods avail- gall, both hybridizing nodes of the gall, and the unique able for such a subtree is determined by the arrangement hybrid node of the gall, and that enumerates the ways of the side nodes on the gall. The internal nodes in the in which the labeled histories of these subtrees can be subtrees of the side nodes are then distributed into the interwoven. available time periods. For each time period, the num- We apply this recursive function proceeding down ber of ways of arranging all nodes assigned to that time through the galled tree. Each gall contains, at minimum, period across the various subtrees is given by a multino- three associated subtrees—two descended from the two mial coefficient. The number of labeled histories for each hybridizing side nodes, and one from the hybrid node. assignment of internal nodes to time periods is then the Hence, the recursive component of our enumeration of product across time periods of the associated multino- labeled histories associated with a galled tree considers at mial coefficients for the time periods. The total number least three subtrees; in other words, it proceeds by noting of labeled histories for each ordering of the side nodes on that all galls are divided into three or more parts. the gall is the sum across assignments of internal nodes Figure 3 shows an example of a small galled tree. Con- to time periods of the number of labeled histories for sidering the gall at the root node (node 22), subtrees T each assignment. ℓ1 (descended from node 14, a left non-hybridizing side In the example, we have two arrangements for the side node), T (descended from node 12, the left hybridiz- nodes: (14, 21) and (21, 14). We first consider arrange - ing side node), T (descended from node 11, the hybrid ment (21, 14) depicted in Fig. 3. We calculate the num- node), T (descended from node 13, the right hybridiz- ber of ways to distribute the nodes from the subtrees into ing side node), and T (descended from node 21, a right the time periods, and for each distribution we then cal- r1 non-hybridizing side node) are galled trees with 1, 1, 1, 2, culate the numbers of arrangements within each of the and 3 labeled histories, respectively. For T , T , and T , time periods. There are three time periods: (i) between ℓ1 C R the subtrees are trees in the usual sense, and the numbers 21 and 14, (ii) between 14 and the hybridizing side nodes, of labeled histories follow eqs. 1 and 2. Galled subtree T and (iii) below the hybridizing side nodes. The ordering trivially possesses only one labeled history. For T , b oth of the internal nodes within a subtree does not affect the r1 its subtrees each trivially possess a single labeled history. permissible placements of these nodes within the time The left subtree possesses a coalescence at node 18 and a periods. Therefore, the number of labeled histories for hybridization represented by simultaneous nodes 15–17, a particular ordering of the side nodes on the gall is the and the right subtree has the one coalescence at node 19. product of two quantities: (1) the product across sub- The right subtree can be arranged in one of three ways in trees of the numbers of labeled histories for the subtrees, relation to the left subtree: node 19 nearer in time to the L (T ) , and (2) the number of ways that these labeled H i root than node 18, between node 18 and nodes 15–17, histories can be interwoven in relation to one another for or more recent than nodes 15–17. Hence, the number of the fixed ordering of the side nodes on the gall. labeled histories for T is 3. To count this latter quantity, we must consider all pos- r1 For the gall at the root node, the left non-hybridizing sible placements of subtree nodes into time periods. We side node can be arranged in relation to the right non- define an “event” to be either a (non-hybridizing) coales - hybridizing side node. The number of arrangements cence or a hybridization. This concept of an event cor - of these non-hybridizing side nodes in relation to one responds to that of an internal node used in the recursive 1+1 another is = 2 , as we are counting arrangements of calculation of the number of labeled histories for trees in 1 left non-hybridizing side node and 1 right non-hybrid- eq. 1. Each non-hybridizing tree node corresponds to a izing side node. In general, when including the hybridiz- coalescence event. We denote the number of events in ing side nodes in node counts n and n , the number of a subtree T as v(T ) . The total number of events in any ℓ r i i arrangements of n left side nodes and n right side nodes galled tree is two times the number of hybrid nodes sub- ℓ r n +n −2 ℓ r is . tracted from the total number of internal nodes, because n −1 The arrangement of the side nodes creates “time peri - each hybridization event is represented by three simulta- ods.” The arrangement depicted in the example has three neous internal nodes—the hybrid node and two hybridiz- nontrivial time periods. The first lies below the first side ing nodes. For a subtree T , the number of time periods in i Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 6 of 13 which the events of that subtree can occur, p(T ) , depends General algorithm on the arrangement of the side nodes. The events in the Our general result follows the example to recursively cal- subtree cannot occur in time periods preceding the side culate the number of labeled histories in any galled tree. node from which the subtree descends. For each subtree Consider a galled tree T with root node v. Either v is the T , we count the available time periods for the events of top node of a gall or it is not the top node of a gall. T . When the left side node, node 14, occurs after the right If v is not a top node, then the number of labeled his- side node, node 21, the numbers of time periods available tories of the tree rooted at v is the product of the num- to T , T , T , T , and T are 2, 1, 1, 1, and 3, respectively. bers of labeled histories for the two subtrees of v and the ℓ1 L C R r1 The number of ways to distribute n events—which number of ways in which those subtrees can be interwo- have already been ordered—into t time periods is ven. We recursively proceed to the children of v to count n+t−1 . Let v(T ) denote the number of events in sub- labeled histories for the subtrees descended from these tree T . When node 21 precedes node 14, the total children. number of ways to arrange the subtree events into If v is the top node of a gall, then we proceed as in v(T )+p(T )−1 i i time periods is , or in this case, the example. Denote by G the gall for which v is the top T v(T ) 2+2−1 2+1−1 1+1−1 3+1−1 4+3−1 node. Suppose G has left non-hybridizing side nodes = 45. 2 2 1 3 4 g , g , . . . , g and right non-hybridizing side nodes ℓ1 ℓ2 ℓN Consider one of these 45 arrangements of events into g , g , . . . , g . The subtrees of the galled tree are then r1 r2 rM time periods, say, in which tree T has its nodes 1 and 2 ℓ1 T from the left hybridizing side node, T from the right L R in the third and second time periods, respectively, and T r1 hybridizing side node, T from the hybrid node, and has its internal node 20 in the second period and nodes T ,T , . . . ,T ,T ,T , . . . ,T from the non-hybrid- ℓ1 ℓ2 ℓN r1 r2 rM 15–19 in the third (Fig. 3). For T , T , and T , the internal L C R izing side nodes. We can count the number of labeled nodes are trivially in the final (third) period. For each of the histories for the subtree defined by the gall rooted at v, periods, we must count the number of orderings permit- supposing the numbers of labeled histories are known for ted for internal nodes allowed within the period. The first all these various subtrees. period (between nodes 21 and 14) has no nodes, so this period trivially has one arrangement. Nodes 2 and 20 occur 1. We enumerate the possible orderings of the left side in the second period between node 14 and the hybridiza- nodes in relation to the right side nodes. Let the set tion represented by nodes 11-13. Because nodes 2 and 20 of all orderings of side nodes of the gall rooted at v are from different subtrees, they can be arranged in either be S . The ordering of the left side nodes is fixed and of two orders in relation to one another, so there are 2 pos- the ordering of the right side nodes is fixed; this step sible orderings within the second time period. The final counts the ways in which the left and right side nodes period has 10 events; 1, 2, 1, 3, and 3 from T , T , T , T , ℓ1 L C R N+M can be interwoven. Hence, S has cardinality . 10 v and T , respectively. Hence, there are = 50, 400 r1 1,2,1,3,3 Each arrangement of the side nodes defines “time possible orderings of events in that period. For the fixed periods” between side nodes, into which other nodes subtree labeled histories and distribution of events across can be placed. periods shown in Fig. 3, the number of labeled histories is N+M 2. We separately consider each of the arrange- 2 × 50, 400 = 100, 800. ments of the non-hybridizing side nodes—the ele- We repeat this procedure for each of the 45 cases, for ments of S —and enumerate assignments of inter- each case counting its associated product of multino- nal nodes of the subtrees descended from the mial coefficients. We will see that by careful indexing, the non-hybridizing side nodes (and the hybridizing appropriate product of multinomial coefficients can be nodes and hybrid node) into time periods. Let the set obtained generally. Summing across the 45 cases, we obtain of all assignments in an arrangement of side nodes 2,162,160 labeled histories. s ∈ S be X(s). The number of assignments of these Keeping the labeled histories of the subtrees fixed, internal nodes depends on the numbers of time we count labeled histories for the other arrange- periods that are available to the various subtrees. ment of the side nodes with node 14 preceding node Internal nodes in a subtree can only be assigned 21, obtaining 1,801,800; this calculation sums across into time periods that occur after the non-hybridiz- 2+3−1 2+1−1 1+1−1 3+1−1 4+2−1 = 30 cases. Mul- 2 2 1 3 4 ing side node (or hybridizing node or hybrid node) tiplying by L (T ) = 2 × 1 × 1 × 1 × 3 = 6 , the H i from which the subtree descends. Let the num- product across subtrees of the numbers of labeled histo- ber of time periods available to subtree T be p(T ) . i i ries for the subtrees, the total number of labeled histories is Recalling that v(T ) is the number of coalescence or 6 × (2, 162, 160 + 1, 801, 800) = 23, 783, 760. hybridization events in the galled tree T , the num- We are now ready for the general computation. ber of ways to arrange the internal nodes of subtree M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 7 of 13 v(T )+p(T )−1 i i T into subtrees is . That is, because v(T ) i L (T ) each subtree can have its nodes assigned without k∈T considering the assignment of other subtrees, the N+M+1 N+M+3 x(i,j) total number of assignments of internal nodes to i=1 × , time periods is the product of the number of assign- A , A , . . . A x(1,j) x(2,j) x(N+M+3,j) s∈S x∈X (s) j=1 ments over all subtrees. Therefore X(s) has cardinal- (3) v(T )+p(T )−1 i i ity , where the product traverses i v(T ) where T ={T , T , ... , T , T , T , ... , T , T , T , T }. ℓ1 ℓ2 ℓN r1 r2 rM L R C N + M + 3 nodes: the N + M non-hybridizing side We recursively enumerate the labeled histories of a galled nodes as well as the two hybridizing nodes and the tree, applying the steps beginning from the tree root and hybrid node. proceeding to the leaves through each top node of a gall. 3. For each assignment of internal nodes to time peri- ods, we count the number of orderings of those Small galled trees internal nodes within the time periods. For each We exhaustively count the labeled histories for all galled assignment, we list the numbers of nodes assigned to trees with six or fewer leaves. For each unlabeled galled tree each of the N + M + 1 time periods. We construct with six or fewer leaves, Tables 1, 2, 3, 4, and 5 report the (N+M+3)×(N+M+1) a matrix for assignment x, A ∈ Z , numbers of labeled histories associated with an arbitrary where entry A is the number of events from x(i,j) labeling of the galled tree; a summary appears in Table 6. subtree i that are placed in time period j and N+M+1 A = v(T ) . The number of labeled his - x(i,j) i j=1 Enumeration of small galled trees tories for the specific assignment is then equal to First, we enumerate all unlabeled galled trees with six or N+M+1 N+M+3 fewer leaves. This enumeration proceeds by first listing x(i,j) i=1 all trees with no galls. The number of such trees follows A , A , . . . , A x(1,j) x(2,j) x(N+M+3,j) j=1 the Wedderburn-Etherington numbers, 4. We combine steps 1–3 to obtain the number of U =1 labeled histories for the gall whose top node is v. In −1 n n U (U + 1) particular, we now have the total number of labeled 2 2 U = U U + , even n ≥ 2 n k n−k histories for each specific arrangement of the side k=1 nodes and fixed set of labeled histories for the sub - n−1 trees. The number of labeled histories for the subtree U = U U , odd n ≥ 3. k n−k defined by the gall is then the sum of the number k=1 of labeled histories across each arrangement of the side nodes on the gall multiplied by the numbers of The number of unlabeled trees with n leaves is obtained labeled histories of the subtrees. In other words, the by combining all possible pairs of subtrees, one with k number of labeled histories for the gall rooted at v is leaves, 1 ≤ k ≤⌊ ⌋ , and the other with n − k leaves. Fig. 4 Removing a gall does not decrease the number of labeled histories. A A tree with one gall. B A tree obtained from A via a transformation that removes the gall; we remove the hybrid node and one of the hybridizing side nodes, choosing the right hybridizing side node arbitrarily here. We then add two edges, between the left hybridizing side node and the child of the hybrid node, and between the parent and remaining child of the right hybridizing side node (blue). Each labeled history for A has a corresponding labeled history in B Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 8 of 13 Table 1 Number of labeled histories for galled trees with at Table 2 Number of labeled histories for galled trees with 5 most 4 leaves leaves Number of Number of Galled tree Number Number of Number of Galled tree Number leaves galls of labeled leaves galls of labeled histories histories 1 0 1 5 0 1 2 0 1 5 1 1 3 0 1 5 0 2 3 1 1 5 1 1 4 0 1 5 1 1 4 1 1 4 0 2 5 1 1 4 1 1 5 0 3 4 1 1 5 1 3 4 1 1 5 1 1 Galled trees with different numbers of galls appear in different colors (0, black; 1, 5 2 1 orange; 2, purple). For each unlabeled galled tree shown, an arbitrary labeling of the leaves is assumed, and the number of labeled histories associated with that arbitrary labeling is shown 5 1 2 5 1 2 To enumerate all galled trees with n leaves, we follow a similar procedure of combining smaller galled trees to 5 1 1 form a galled tree of a fixed number of leaves. We con - 5 2 1 sider two cases: either the root node is the top node of a gall or it is not. If the root node is not the top node of 5 1 2 a gall, then we recursively form galled trees in the same way as in the case of no galls, by combining pairs of 5 1 1 smaller galled trees. For the other case, if the root node is a top node of a 5 1 1 gall, then we consider all galls that are possible at the 5 1 1 top of the tree. For a galled tree with n leaves, a gall has a minimum of 3 subtrees: two for the hybridizing nodes 5 1 1 and one for the hybrid node. The maximum number of subtrees emanating from the gall is n, corresponding to 5 1 1 the case in which there are n − 3 non-hybridizing side nodes, each of which has a leaf node for its associated subtree. The non-hybridizing side nodes can be placed into the left and right sides of the gall in each of multiple ways. where each part represents a specific one of the subtrees. Without loss of generality, suppose that the number Each subtree of each composition is a smaller galled tree. of non-hybridizing side nodes on the left side is always For n = n , we proceed by allowing each combination ℓ r greater than or equal to the number on the right, n ≥ n . ℓ r of smaller galled trees of the n + n + 3 sizes specified. ℓ r The number of subtrees emanating from the gall then In the case with n = n , we must be careful not to dou- ℓ r equals n + n + 3 (here we exclude the hybridizing side ℓ r ble-count. Write a vector c representing the composition nodes from n and n ). We enumerate the ways to par- ℓ r that counts leaves in the n + n + 3 subtrees. The com - ℓ r tition n leaves into n + n + 3 labeled categories—the ℓ r position is ordered from “left” to “right,” starting from the number of compositions of n into n + n + 3 parts, ℓ r M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 9 of 13 most ancestral left side node, proceeding from ancestor labeled histories. Consider a galled tree T. We delete the to descendant to the left hybridizing side node, then the hybrid node and the right hybridizing side node. We then hybrid node and the right hybridizing side node, and pro- add an edge that joins the left hybridizing side node and ceeding from descendant to ancestor to the most ances- the child of the hybrid node, and another edge that joins tral right side node. the parent and child of the right hybridizing side node A composition can be “palindromic” in the sense that (Fig. 4). The resulting galled tree T has the same number it is invariant with respect to inversion of the order of its of leaves as T. Further, each labeled history for T contin- terms; for example (3, 2, 4, 2, 3) is a palindromic com- ues to have an associated labeled history for T —the coa- position of 14, whereas (2, 2, 3, 3, 4) is non-palindromic. lescence of the left hybridizing side node, hybrid node, For a non-palindromic composition c, let c be the com- and right hybridizing side node in T is now indexed only ′ ′ position obtained by inverting the order of its terms: by the left hybridizing side node in T . Hence T has at (4, 3, 3, 2, 2) for (2, 2, 3, 3, 4), for example. We consider least as many labeled histories as T, and indeed might two cases: have more, as the nodes in the subtree of the former right (i) For each pair of non-palindromic compositions of hybridizing side node of T can now move above the for- n + n + 3 , (c, c ) , we only consider one of the two. mer left hybridizing node, and hence are less constrained ℓ r (ii) For each palindromic composition c, we enumerate in T . the set of all possible lists of n + 1 subtrees for the As a corollary of this argument, given a fixed number left side nodes, including the left hybridizing side of leaves, in the set of galled trees that possess the larg- node, in some specified order. We choose two lists est number of labeled histories—a set that contains at in this set, allowing replacement, one for the sub- least one and potentially more than one element—at least trees of the left side nodes, and one for the subtrees one element is a tree with no galls. Other consequences of the right side nodes (proceeding backward from include: (i) for galled trees with n leaves, the galled tree the end of the composition). If the two lists are dif- that maximizes the number of labeled histories among ferent, then we always use for the left side nodes galled trees with g galls, g ≥ 1 , has no more labeled his- the list that appears earlier in the order. To com- tories than the galled tree that maximizes the number plete the enumeration, we combine all possible lists of labeled histories among galled trees with g − 1 galls. of subtrees for the left and right side nodes with all Further, (ii) no galled tree with n leaves has more labeled possible subtrees for the hybrid node. histories than the labeled topology (with no galls) that maximizes the number of labeled histories. Note that for a tree with n leaves, the maximal number n−1 of galls is ⌊ ⌋ . To verify this claim, start with a galled 2 Discussion tree with a single gall and three leaves—the minimum We have devised a method for enumerating the labeled number of leaves for a galled tree, as the hybridizing histories for rooted binary labeled galled trees. The and hybrid nodes must each have at least one descend- method generalizes the classic enumeration of labeled ant. Each subsequent gall adds at least two leaves, as a histories for labeled topologies, extending it to a simple gall can replace at most one existing leaf. Therefore, the family of phylogenetic networks. We have applied our minimum number of leaves for a galled tree with g galls is new algorithm to enumerate labeled histories both in n−1 2g + 1 , so that n ≥ 2g + 1 , or g ≤⌊ ⌋. 2 an illustrative example and exhaustively for small galled trees with at most six leaves. In this latter analysis, we Labeled histories for small galled trees have found that for a fixed number of leaves, adding galls Examining Tables 1, 2, 3, 4, and 5, we can observe the generally reduces the number of labeled histories. pattern that for a fixed number of leaves, for trees with Labeled histories have long been a focus of studies in no galls, the number of labeled histories increases with phylogenetics [6, 11], appearing often in calculations increasing tree balance. Caterpillar trees, in which there that describe the probability that random trees pro- exists an internal node descended from all other internal duce specified shapes under evolutionary models [18, nodes, possess only one labeled history. 24, 26]. Recent studies have been expanding the sense In general, trees with more galls tend to have fewer in which labeled histories are considered. For example, labeled histories than trees with fewer galls. Indeed, we King & Rosenberg [14] examined a concept of labeled can always remove a gall while retaining the same num- histories in which simultaneous binary mergers of lin- ber of leaves and retaining or increasing the number of eages are permitted. Bienvenu et al. [1] explored the Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 10 of 13 Table 3 Number of labeled histories for galled trees with 6 Table 4 Number of labeled histories for galled trees with 6 leaves (first part) leaves (second part) Number of Number of Galled tree Number Number of Number of Galled tree Number leaves galls of labeled leaves galls of labeled histories histories 6 0 1 6 0 4 6 1 1 6 1 4 6 0 2 6 0 8 6 1 1 6 1 4 6 1 1 6 1 4 6 1 4 6 1 1 6 0 3 6 0 6 6 1 3 6 1 6 6 1 1 6 2 6 6 2 1 6 1 1 6 1 2 6 2 1 6 1 2 6 1 2 6 1 1 6 2 1 6 2 1 6 2 1 6 2 1 6 1 2 6 1 1 6 1 3 6 1 1 6 2 3 6 1 3 6 1 1 6 1 1 6 2 3 6 1 1 6 1 3 6 2 3 possibility of providing labeled histories to phyloge- 6 1 6 netic networks, specifically tree–child networks. They emphasized that ranked tree-child networks, which 6 1 1 impose a temporal structure on tree-child networks, have biological relevance because chronological pro- 6 2 1 cesses are, by definition, rankable. Bienvenu et al. [1 ] suggested the problem of enumerating labeled histo- ries for a tree–child network, noting the difficulty that M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 11 of 13 Table 5 Number of labeled histories for galled trees with 6 Table 5 (continued) leaves (third part) Number of Number of galls Galled tree Number leaves of labeled Number of Number of galls Galled tree Number histories leaves of labeled histories 6 1 2 6 1 2 6 2 1 6 2 1 networks do not possess the recursive properties of 6 2 1 trees. We have found that for the galled trees, a subset of the tree–child networks, we can continue to use a 6 1 3 tree-like recursive approach to enumerate labeled his- tories. To our knowledge, our calculation provides the 6 2 3 first enumeration of labeled histories beyond trees to a class of phylogenetic networks. 6 1 3 Our enumeration facilitates the understanding of 6 1 3 factors that affect the number of labeled histories for galled trees. We found that for a fixed number of leaves, 6 1 3 increasing the number of galls does not increase, and often decreases, the number of labeled histories. We have 6 1 only considered small numbers of leaves, and as the num- 6 2 1 ber of leaves increases, it will be of interest to explore the effect on the number of labeled histories of gall 6 1 2 locations—for example, with a top node located or not located at the tree root, or with multiple galls descended 6 1 2 from one another or not descended from one another. For labeled topologies with a fixed number of leaves, the 6 1 2 topology with the maximum number of labeled histories 6 2 2 has a high level of “balance” [10], and permitting galls does not change the identity of the galled tree with the 6 1 2 maximal number of labeled histories. Future work, how- ever, can examine more generally the effect of balance on 6 1 2 the number of labeled histories for galled trees. An important aspect of our analysis is that the sense 6 2 2 in which we consider galled trees has an explicit tempo- 6 1 2 ral ordering, in which each gall possesses two hybridiz- ing nodes and a hybrid node that are contemporaneous. 6 1 2 With their explicit potential to be temporally ordered, the rooted galled trees here and galled trees in other stud- 6 1 2 ies are not generally precisely identical, as the tempo- 6 1 2 ral requirement we have imposed is a case of the recent approach of Bienvenu et al. [1] and has not yet been 6 1 2 frequently assumed. We have provided an enumeration algorithm for the galled trees we consider; the counts 6 1 2 of 1, 1, 2, 6, 20, and 72 for the numbers of rooted unla- beled galled trees for 1 to 6 leaves (Table 6) differ from 6 1 2 counts and formulas reported in related enumerative 6 1 2 studies [2, 4, 7, 8, 21]. In studies of labelings for galled trees and phylogenetic networks more generally, care is 6 1 2 Mathur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 12 of 13 Table 6 Features of galled trees with 6 or fewer leaves Number of leaves Number of galled Number of galled trees with g galls Maximum number of labeled histories trees among galled trees with g galls g = 0 g = 1 g = 2 g = 0 g = 1 g = 2 1 1 1 0 0 1 – – 2 1 1 0 0 1 – – 3 2 1 1 0 1 1 – 4 6 2 4 0 2 1 – 5 20 3 15 2 3 3 1 6 72 6 48 18 8 6 6 The features are extracted from Tables 1, 2, 3, 4, and 5 needed in recognizing the precise set of objects under for other classes of phylogenetic networks could rely on consideration. creatively finding such recursive properties. The enumeration of labeled histories is more compu - Acknowledgements tationally challenging for galled trees than for trees with We thank Marc Feldman and a reviewer for comments on a draft of the no galls. Whereas the evaluation of the number of labeled manuscript histories for trees with no galls can use a simple nonre- Author contributions cursive formula (eq. 2), the algorithmic enumeration of SM and NAR performed the research and wrote the paper. All authors read labeled histories for galled trees requires a number of and approved the final manuscript. steps that, for some trees, increases at least exponentially Funding in the number of leaves. In enumerating labeled histo- We acknowledge support from National Institutes of Health grant R01 ries, the first step for each gall is to sum over all orderings GM131404 and National Science Foundation grant NSF BCS-2116322. of the left side nodes in relation to the right side nodes. Availibility of data and materials Consider a family of trees T with n = 4k + 1 leaves for Not applicable. k ≥ 1 . Supp ose T has a root gall with k left side nodes and k right side nodes, each with two descendant leaves, Declarations for a total of 4k leaves descended from the side nodes; the Ethical approval and consent to participate last leaf is descended from the hybrid node. The num - Not applicable. ber of orderings of left and right side nodes over which 2k we must sum in eq. 3 is , a quantity that increases Competing interests √ k √ k n The authors declare that they have no competing interests. exponentially, as 4 / πk , or ( 2) 2/[π(n − 1)] . Con- sidering galled trees more generally, computation time increases with the number of side nodes in galls, the Received: 13 September 2022 Accepted: 27 January 2023 number of leaves descended from those side nodes, and the number of galls descended from one another along a path from the root to the leaves. For a fixed number of References leaves n, maximizing any of these three quantities occurs 1. Bienvenu F, Lambert A, Steel M. Combinatorial and stochastic properties by reducing the other two, so that the configuration of of ranked tree-child networks. Random Struct Alg. 2021;60:653–89. galls and leaves that maximizes computation time for the 2. Bouvel M, Gambette P, Mansouri M. Counting phylogenetic networks of level 1 and 2. J Math Biol. 2020;81:1357–95. enumeration—as well as the complexity of the computa- 3. Brown JKM. Probabilities of evolutionary trees. Syst Biol. 1994;43:78–91. tion itself—remains unknown. 4. Cardona G, Zhang L. Counting and enumerating tree-child networks and The potential to embed galled trees in a recursive their subclasses. J Comp System Sci. 2020;114:84–104. 5. Degnan JH, Rosenberg NA. Discordance of species trees with their most framework is central to our solution for enumerat- likely gene trees. PLoS Genet. 2006;2:762–8. ing labeled histories for labeled galled trees. The solu - 6. Edwards AWF. Estimation of the branch points of a branching diffusion tion enables us to treat galls similarly to internal nodes process. J Roy Statist Soc Ser B. 1970;32:155–74. 7. Fuchs M, Yu G-R, Zhang L. Asymptotic enumeration and distributional in standard phylogenetic trees by defining subtrees that properties of galled networks. J Comb Theory Ser A. 2022;189: 105599. descend from nodes of a gall. It is possible that a more 8. Gunawan AD, Rathin J, Zhang L. Counting and enumerating galled general solution to the enumeration of labeled histories networks. Discr. Appl. Math. 2020;283:644–54. 9. Gusfield D. ReCombinatorics. Cambridge, MA: MIT Press; 2014. M athur and Rosenberg Algorithms for Molecular Biology (2023) 18:1 Page 13 of 13 10. Hammersley JM, Grimmett GR. Maximal solutions of the generalized subadditive inequality. In: Harding EF, Kendall DG, editors. Stochastic Geometry. London: Wiley; 1974. p. 270–85. 11. Harding EF. The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob. 1971;3:44–77. 12. Huson DH, Rupp R, Scornavacca C. Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge: Cambridge University Press; 13. Kim J, Rosenberg NA, Palacios JA. Distance metrics for ranked evolution- ary trees. Proc Natl Acad Sci USA. 2020;117:28876–86. 14. King MC, Rosenberg NA. On a mathematical connection between single- elimination sports tournaments and evolutionary trees. bioRxiv 2022, https:// doi. org/ 10. 1101/ 2022. 08. 09. 503313. 15. Knuth DE. The Art of Computer Programming, vol. 3. 2nd ed. Reading, MA: Addison-Wesley; 1998. 16. Mehta RS, Bryant D, Rosenberg NA. The probability of monophyly of a sample of gene lineages on a species tree. Proc Natl Acad Sci USA. 2016;113:8002–9. 17. Mehta RS, Rosenberg NA. The probability of reciprocal monophyly of gene lineages in three and four species. Theor Pop Biol. 2019;129:133–47. 18. Mehta RS, Steel M. Rosenberg NA The probability of joint monophyly of samples of gene lineages for all species in an arbitrary species tree. J Comput Biol. 2022;29:679–703. 19. Rosenberg NA. The shapes of neutral gene genealogies in two species: probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution. 2003;57:1465–77. 20. Rosenberg NA. The mean and variance of the numbers of r -pronged nodes and r -caterpillars in Yule-generated genealogical trees. Ann Comb. 2006;10:129–46. 21. Semple C, Steel M. Unicyclic networks: compatibility and enumeration. IEEE/ACM Trans Comput Biol Bioinform. 2006;3:84–91. 22. Song YS. A concise necessary and sufficient condition for the existence of a galled-tree. IEEE/ACM Trans Comput Biol Bioinform. 2006;3:186–91. 23. Song YS. Properties of subtree-prune-and-regraft operations on totally- ordered phylogenetic trees. Ann Comb. 2006;10:147–63. 24. Steel M. Phylogeny: Discrete and Random Processes in Evolution. Phila- delphia: Society for Industrial and Applied Mathematics; 2016. 25. Steel M, McKenzie A. Properties of phylogenetic trees generated by Yule- type speciation models. Math Biosci. 2001;170:91–112. 26. Wiehe T. Counting, Grafting and Evolving Binary Trees. In: Baake E, Wakolbinger A, editors. Probabilistic structures in evolution. Zurich: EMS Publishing House; 2021. p. 427–50. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Re Read ady y to to submit y submit your our re researc search h ? Choose BMC and benefit fr ? Choose BMC and benefit from om: : fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions
Algorithms for Molecular Biology – Springer Journals
Published: Feb 13, 2023
Keywords: Galled trees; Labeled histories; Mathematical phylogenetics; Phylogenetic networks
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.