Access the full text.
Sign up today, get DeepDyve free for 14 days.
N. Sloane (2003)
The On-Line Encyclopedia of Integer SequencesElectron. J. Comb., 1
(1992)
Matroid applications
Federico Ardila (2019)
CAT(0) geometry, robots, and societyArXiv, abs/1912.10007
Ezra Miller, Megan Owen, S. Provan (2012)
Polyhedral computational geometry for averaging metric phylogenetic treesAdv. Appl. Math., 68
R. Stanley (1994)
A Survey of Eulerian Posets
(1992)
editor
É. Forest (2016)
The Logarithm of a Map
F ( )) ) is a CAT(0)-metric space whose points are in one-to-one correspondence with isomorphism classes of equidistant X-cactuses
A. Gavryushkin, A. Drummond (2014)
The space of ultrametric phylogenetic trees.Journal of theoretical biology, 403
M. Bridson, A. Haefliger (1999)
Metric Spaces of Non-Positive Curvature
M. Bacák (2012)
Computing Medians and Means in Hadamard SpacesSIAM J. Optim., 24
H. Chan, J. Jansson, T. Lam, S. Yiu (2005)
Reconstructing an Ultrametric Galled Phylogenetic Network from a Distance MatrixJournal of bioinformatics and computational biology, 4 4
(c) If u is the parent of a reticulation vertex v in a reticulation cycle { P, P (cid:48) } then neither of the directed paths P , P (cid:48)
M. Bordewich, Nihan Tokac (2016)
An algorithm for reconstructing ultrametric tree-child networks from inter-taxa distancesDiscret. Appl. Math., 213
Satyan Devadoss, Samantha Petti (2016)
A Space of Phylogenetic NetworksSIAM J. Appl. Algebra Geom., 1
T. Nye (2014)
An Algorithm for Constructing Principal Geodesics in Phylogenetic TreespaceIEEE/ACM Transactions on Computational Biology and Bioinformatics, 11
A. Houcine (2006)
On hyperbolic groups, 9
Federico Ardila, Caroline Klivans (2003)
The Bergman complex of a matroid and phylogenetic treesJ. Comb. Theory, Ser. B, 96
Satyan Devadoss, Cassandra Durell, S. Forcey (2019)
Split Network Polytopes and Network SpacesarXiv: Combinatorics
M. Baroni, C. Semple, M. Steel (2006)
Hybrids in real time.Systematic biology, 55 1
M. Steel (2016)
Phylogeny: Discrete and Random Processes in Evolution
K. Huber, V. Moulton, Taoyang Wu (2016)
Transforming phylogenetic networks: Moving beyond tree space.Journal of theoretical biology, 404
D. Robinson, L. Foulds (1979)
Comparison of weighted labelled trees
K. Huber, V. Moulton, A. Spillner (2021)
Phylogenetic consensus networks: Computing a consensus of 1-nested phylogenetic networks
Kenneth Rosen, D. Shier, Wayne Goddard (2017)
Partially Ordered Sets
L. Billera, Susan Holmes, K. Vogtmann (2001)
Geometry of the Space of Phylogenetic TreesAdv. Appl. Math., 27
Momoko Hayamizu, K. Huber, V. Moulton, Yukihiro Murakami (2019)
Recognizing and realizing cactus metricsArXiv, abs/1908.01524
M. Bordewich, S. Linz, C. Semple (2017)
Lost in space? Generalising subtree prune and regraft to spaces of phylogenetic networks.Journal of theoretical biology, 423
N. Amenta, Matthew Godwin, Nicolay Postarnakevich, K. John (2007)
Approximating geodesic tree distanceInf. Process. Lett., 103
D. Barden, Huiling Le (2017)
The logarithm map, its limits and Fréchet means in orthant spacesProceedings of the London Mathematical Society, 117
F. Rosselló, G. Valiente (2009)
All that Glisters is not GalledMathematical biosciences, 221 1
Philippe Gambette, L. Iersel, Mark Jones, Manuel Lafond, F. Pardi, Céline Scornavacca (2017)
Rearrangement moves on rooted phylogenetic networksPLoS Computational Biology, 13
Marc Hellmuth, David Schaller, P. Stadler (2021)
Compatibility of Partitions, Hierarchies, and Split SystemsArXiv, abs/2104.14146
M. Mittal, Shailendra Singh, Dolly Sharma (2021)
PhylogeneticsBioinformatics and RNA
F ( )) corresponds, up to isomorphism, to a unique equidistant Xcactus (N , t) as follows. Put σ = |supp(ω)| and C = supp(ω) ∪ {{(X, ∅)}}
of the set pair systems in C. By Theorem 11, there exists, up to isomorphism, a unique ranked X-cactus (N = ((V, A), ϕ), r) with σ(r) = σ and S i = S i (N ) for
F. Bienvenu, A. Lambert, M. Steel (2020)
Combinatorial and stochastic properties of ranked tree‐child networksRandom Structures & Algorithms, 60
F Ardila-Mantilla (2020)
CAT(0) geometry, robots, and societyNotices of the AMS, 67
Remie Janssen, Mark Jones, P. Erdős, Leo Iersel, Céline Scornavacca (2017)
Exploring the Tiers of Rooted Phylogenetic Network Space Using Tail MovesBulletin of Mathematical Biology, 80
A. Willis (2016)
Confidence Sets for Phylogenetic TreesJournal of the American Statistical Association, 114
SP1) for all ( S 1 , H 1 ) ∈ S 1 there exists some ( S 2 , H 2 ) ∈ S 2 with ( S 1 , H 1 ) ≤ ( S 2 , H 2 )
b) If u has outdegree at least 2 then u is not the parent of a reticulation vertex that is a leaf
Alessandra Caraceni, Michael Fuchs, Guanglong Yu (2021)
Bijections for ranked tree-child networksDiscret. Math., 345
V. Chepoi (2000)
Graphs of Some CAT(0) ComplexesAdv. Appl. Math., 24
T. Nye, Xiaoxian Tang, G. Weyenberg, R. Yoshida (2016)
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic treesBiometrika, 104
Marc Hellmuth, David Schaller, P. Stadler (2021)
Compatibility of partitions with trees, hierarchies, and split systemsDiscret. Appl. Math., 314
Ann. Comb. c 2023 The Author(s) Annals of Combinatorics https://doi.org/10.1007/s00026-023-00656-0 The Space of Equidistant Phylogenetic Cactuses Katharina T. Huber, Vincent Moulton, Megan Owen, Andreas Spillner and Katherine St. John Abstract. An equidistant X-cactus is a type of rooted, arc-weighted, di- rected acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In partic- ular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of sub- sets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidis- tant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks. Mathematics Subject Classification. 05C90, 06A06, 52B70, 92D15. Keywords. Phylogenetic network, Network space, Combinatorial encod- ing, CAT(0)-metric space. 1. Introduction Currently, there is great interest in developing theory and techniques to under- stand and construct (rooted) phylogenetic networks. Generally speaking, for a set of species, such a network consists of a rooted, directed acyclic graph and a bijective map from the species to the set of sinks of the graph (in case the 0123456789().: V,-vol K. T. Huber et al. graph is a tree, the network is called a (rooted) phylogenetic tree). Phyloge- netic networks are important as they can be used to represent the evolutionary history of species that cross with one another (through evolutionary processes such as hybridization and recombination). To date, much of the research on phylogenetic networks has focused on understanding the structure of special types of networks and ways to build them (see [34] for a recent overview of the area). More recently, however, as the theory for phylogenetic networks has developed, there has been growing interest in understanding how to equip col- lections of phylogenetic networks with suitable metrics, giving rise to so-called network spaces. As has been demonstrated for the intensively studied spaces of phylogenetic trees (cf. e.g. [8, 18], and the review [32]), or tree-spaces, this point of view is valuable as it provides insights into statistical approaches to analyze and systematically compare networks. Network spaces essentially come in two types: discrete and continuous. In discrete spaces, the elements of the space are distinct, non-isomorphic net- works, and a metric is commonly given by defining the distance between two networks to be the length of a minimal sequence of local network operations that converts one network into the other. In continuous spaces, the arcs in the networks have non-negative, real-valued lengths and one network can be con- verted into the other by shrinking or lengthening arcs in a continuous manner. To date, nearly all results on network spaces have concerned discrete spaces (see, for example, [9, 17, 24], for related results on discrete spaces of unrooted networks see e.g. [23]). Indeed, to the best of our knowledge, very few results have been presented on continuous network spaces except for the recently in- troduced spaces of (unrooted) circular split networks [16]. This is probably in part because the study of phylogenetic networks with arc lengths is somewhat less developed than the study of those without. In this paper, we introduce a new continuous space of phylogenetic net- works that can be regarded as a generalization of the τ -space of ultrametric trees that was introduced in [18]. For a set X of species, our network space N(X) is comprised of equidistant X-cactuses (see Fig. 1a for an example of such a network). A rooted X-cactus is essentially a rooted phylogenetic net- work in which no two distinct cycles in the underlying graph have an arc in common. Note that if all vertices of a rooted X-cactus have indegree at most 1 the network is just a rooted phylogenetic X-tree. The extensively studied class of (rooted) level-1 networks (see e.g. [29]) also provides examples of rooted X-cactuses. Assigning a non-negative real-valued length to each of the arcs in a rooted phylogenetic network, then such a network N is called equidistant if, for any fixed vertex v of N , all directed paths from v to any sink of N have the same length. Algorithms for constructing equidistant phylogenetic networks have been studied in, e.g., [10] and [13]. Following one of the common approaches used to construct tree-spaces, we define equidistant-cactus space N(X) in terms of an orthant space (see e.g. Strictly speaking, these spaces should probably be thought of as “spaces of circular split collections”. The Space of Equidistant Phylogenetic Cactuses ρ ρ ρ (a) (b) (c) 4 4 4 4 4 4 5 5 5 9 9 9 2 2 4 4 4 4 4 4 3 3 3 3 3 3 3 3 a c e a c e a c e f f f b d b d b d Figure 1. a An X-cactus for X = {a, b, c, d, e, f } with root ρ that is equidistant since every directed path from ρ to a sink has the same length, namely 13. All incoming arcs at vertices with indegree 2 have length 0 and are drawn horizontally. b The rooted X-cactus obtained by lengthening the incoming arc and shrinking the outgoing arcs at vertex v by 1. c The rooted X-cactus obtained by continuing the lengthening and shrinking of the arcs at vertex v until both outgoing arcs have length 0, contracting the cycle below v completely [25]). Basically, an orthant space is a collection of real orthants that are glued together along their boundaries and that is equipped with the metric induced by using the Euclidean metric within each orthant. That is, the distance be- tween two points in the same orthant is the Euclidean distance between these points, and the distance between two points in different orthants is the length of a shortest path, or geodesic path, between these points. The length of such a path is computed by summing the Euclidean lengths of the restrictions of the path to each orthant. In particular, each pair of points in N(X) represents two equidistant X-cactuses, and moving along a geodesic path between the points continuously converts one X-cactus into the other by shrinking and length- ening arcs (see Fig. 1b, c), which may also result in a change of the length of the paths from the root to the sinks. Note that the points of τ -space cor- respond bijectively to equidistant X-trees and that it can be constructed by gluing together orthants indexed by ranked phylogenetic trees. We take a simi- lar approach to define N(X), indexing orthants instead by ranked X-cactuses, in which a ranking of the vertices that respects the direction of the arcs in the rooted X-cactus is given. We remark that ranked phylogenetic networks have been recently introduced and that research has focused on counting and enumerating certain classes of such networks (see e.g. [7, 12] and the references therein). A critical aspect that influenced our construction of N(X) was that—as has been shown for τ -space [18]—we wanted it to be a CAT(0)-metric space. Being CAT(0) is an important geometrical property that has been exploited in various applications within phylogenetics and beyond (see e.g. [3, 14]). A K. T. Huber et al. space being CAT(0) immediately implies that there is a unique geodesic path between any two points, a property that underpins many useful computations that can be performed for tree- and orthant-spaces. More specifically, approx- imations of the median as well as of the Fr´ echet mean and variance can be computed in complete CAT(0)-metric spaces, which include CAT(0)-orthant spaces [4, 25]; a central limit theorem holds for CAT(0)-orthant spaces [5]; and methods for computing confidence sets [37] and an analogue of partial principal component analysis [26, 27] can be directly extended from the unrooted tree space presented in [8] to CAT(0)-orthant spaces. Most of this paper is devoted to proving a crucial combinatorial result concerning rooted X-cactuses (Theo- rem 2) which implies, via a classical result of Gromov for orthant spaces, that N(X) is CAT(0). In passing, we remark that the space of networks described in [16] is not a CAT(0)-metric space. The rest of this paper is structured as follows. In Sect. 2, we formally define rooted X-cactuses as well as some related concepts. In Sect. 3, we then introduce rankings of rooted X-cactuses and equidistant X-cactuses, which are both defined in terms of so-called time-stamp functions. As well as character- izing when a rooted X-cactus admits a ranking of its vertices that is consistent with the direction of its arcs, we make an important observation concerning ranked X-cactuses (Lemma 2), which implies that the maximal chains in a certain poset mentioned in the next paragraph all have the same length, i. e. |X|− 1. In Sect. 4, we use the simpler case of equidistant X-trees to outline our approach for the construction of a network space that is CAT(0), including a new proof that τ -space is CAT(0). In Sect. 5, we describe how ranked X-cactuses give rise to set pair sys- tems as defined in [22] and present the properties that characterize set pair systems that arise from ranked X-cactuses. We also define a binary relation on general set pair systems, and in Sect. 6, we establish that this relation yields a bounded graded poset on the set pair systems that arise from ranked X- cactuses. In Sect. 7, we establish our main combinatorial result (Theorem 2), namely that chains in this poset encode ranked X-cactuses. In simpler terms, this can be regarded as a “pairwise compatibility” result for set pair systems, which is analogous to the well-known Splits Equivalence Theorem for unrooted phylogenetic trees (see e.g. [30, Theorem 3.1.4]). Using our encoding for ranked X-cactuses, in Sect. 8, we construct the space N(X) of equidistant X-cactuses and show that it is a CAT(0)-metric space. We conclude in Sect. 9 by men- tioning some directions for future work. 2. Preliminaries In this section, we define rooted X-cactuses and some related concepts that we use later. We begin by recalling some standard concepts from graph theory. A directed graph N =(V, A) consists of a finite non-empty set V and a subset A ⊆ V × V . The elements of V and A are referred to as vertices and arcs of N , respectively. A directed graph N is acyclic if there is no directed cycle in N . Moreover, a directed acyclic graph (DAG) N is rooted if there exists a The Space of Equidistant Phylogenetic Cactuses vertex ρ ∈ V with indegree 0, called the root of N , such that for every u ∈ V there is a directed path from ρ to u.InarootedDAG,a leaf is a vertex with outdegree 0, an internal vertex is a vertex with outdegree at least 1, a tree vertex is a vertex with indegree at most 1 and a reticulation vertex is a vertex with indegree at least 2. Note that, by definition, the root of a rooted DAG is a tree vertex. Moreover, in a rooted DAG N,wecallavertex v a child of a vertex u and, similarly, u a parent of v if (u, v)isanarc of N . The set of children of a vertex u is denoted by ch(u). A reticulation cycle {P, P } in a rooted DAG consists of two distinct directed paths P and P such that P and P have the same start vertex and the same end vertex but no other vertices in common. Let X be a finite non-empty set. A rooted X-cactus N =(N, ϕ)isa rooted DAG N =(V, A) together with a map ϕ : X → V such that (RC1) all vertices of N have indegree at most 2, (RC2) no two distinct reticulation cycles in N have an arc in common, and (RC3) the image ϕ(X) contains all leaves and all tree vertices of N with outdegree 1 of N . In Fig. 2a, we give an example of a rooted X-cactus. We remark that if |X| =1 a rooted X-cactus consists of a single vertex only. For better readability, we will often refer to the vertices and arcs of N as the vertices and arcs of N.A rooted X-cactus N is phylogenetic if ϕ is a bijection between X and the set of leaves of N . Note that a rooted phylogenetic X-cactus may contain leaves that are reticulation vertices. A rooted X-cactus is binary if it is phylogenetic, all leaves of N are tree vertices, the root has outdegree 2 and every other internal vertex has either indegree 1 and outdegree 2 or indegree 2 and outdegree 1. A rooted X-cactus N is compressed if ϕ(X) also contains all reticulation vertices with outdegree 1 (see [34, p. 251] for the concept of compression in more general phylogenetic networks). Rooted, compressed, phylogenetic X-cactuses as defined here correspond to 1-nested phylogenetic networks as defined in [22]. Note that a rooted, binary X-cactus that contains at least one reticulation vertex cannot be compressed. A rooted X-cactus without any reticulation vertices is called a rooted X-tree. Note that rooted X-trees as defined here are in one-to-one correspondence with the rooted X-trees as defined in [30] where the root is required to have outdegree 1. In Sect. 7, we will need to associate with every rooted X-cactus N = ((V, A),ϕ) a rooted, phylogenetic X-cactus N =((V, A), ϕ ) as follows: For every x ∈ X such that ϕ(x)isnotaleafof N or such that there exists some y ∈ X −{x} with ϕ(y)= ϕ(x) we add a new vertex u to V ,add thearc (ϕ(x),u)to A, and put ϕ (x)= u. For all other x ∈ X we put ϕ (x)= ϕ(x). The resulting set of vertices and arcs, respectively, are denoted by V and A (see A phylogenetic X-cactus is also known as a rooted 2-hybrid, 1-nested phylogenetic network [29], but for simplicity, we prefer to call it a rooted X-cactus since if the root and directions are ignored we obtain an unrooted X-cactus [20]. K. T. Huber et al. ρρρ (a) (b) (c) hi h i h i b d g g g a c e f j a c e a e b d b d f j f j Figure 2. a A rooted X-cactus N for X = {a, b, c,... ,j}. b The rooted, phylogenetic X-cactus N . c The rooted, com- pressed, phylogenetic X-cactus N Fig. 2b). In addition, we associate with the resulting rooted, phylogenetic X- ∗ ∗ ∗ ∗ cactus N the rooted, compressed, phylogenetic X-cactus N =((V , A ), ϕ ) obtained by contracting all arcs (u, v) where u has outdegree 1 (see Fig. 2c). 3. Rankings, Time-Stamp Functions and Equidistant X-Cactuses In this section, we consider rankings of the vertices of rooted X-cactuses, which are an important part of defining equidistant-cactus space. It is convenient to start with the more general concept of time-stamp functions, which also natu- rally leads to the definition of equidistant X-cactuses. A time-stamp function on the vertices in a rooted X-cactus N =((V, A),ϕ) is a map t : V → R ≥0 such that (TS1) t(v) = 0 for all v ∈ ϕ(X), (TS2) t(u) >t(v) for all arcs (u, v)of N with v not a reticulation vertex, and (TS3) t(v)= t(p )= t(p ) for all reticulation vertices v of N and its two 1 2 parents p and p . 1 2 An example of a time-stamp function on the vertices of a rooted X-cactus is given in Fig. 3. Integer-valued time-stamp functions are also known as temporal labelings (see e.g. [6]). We call a rooted X-cactus N temporal if there exists a time-stamp function on the vertices of N . Note that not every rooted X- cactus is temporal (for example, the rooted X-cactus in Fig. 2a is not temporal because ϕ(X) contains an internal vertex that is not a parent of a reticulation vertex). The following lemma characterizes rooted X-cactuses that are tem- poral(seealso[6, Theorem 3] for a characterization that applies to general rooted phylogenetic networks). Lemma 1. A rooted X-cactus N =((V, A),ϕ) is temporal if and only if for all vertices u ∈ V the following properties hold: (a) If u ∈ ϕ(X) then either u is a leaf or a parent of a reticulation vertex that is a leaf. The Space of Equidistant Phylogenetic Cactuses 3.0 0.6 2.4 2.2 1.2 1.6 1.2 0.00.0 0.8 1.2 0.80.80.8 1.2 0.0 a c e b d Figure 3. A rooted X-cactus N on X = {a, b, c, d, e} with a time-stamp function t on its vertices. For all vertices v the value t(v) is given by the real number to the left of the hor- izontal line through v. In addition, for each arc of N,the length of the arc induced by t is given (b) If u has outdegree at least 2 then u is not the parent of a reticulation vertex that is a leaf. (c) If u is the parent of a reticulation vertex v in a reticulation cycle {P, P } then neither of the directed paths P , P consists of the single arc (u, v). Proof. First assume that N is temporal. Consider a time-stamp function t on the vertices of N.Assumingthat N contains a vertex u that violates one of (a)–(c) immediately yields a contradiction because then t would violate at least one of (TS1)–(TS3). Now assume that (a)–(c) hold for all vertices of N . We construct a time- stamp function t on the vertices of N by first putting t(v) = 0 for all v ∈ ϕ(X). In view of (a) and (b), this does not violate (TS1)–(TS3). Next, consider an internal vertex u that is not a reticulation vertex and also not the parent of a reticulation vertex. Assume that all children w of u have been assigned time-stamps t(w). Then we put t(u) = 1+max t(w). w∈ch(u) Since N is acyclic this does not violate (TS1)–(TS3). Finally, consider an internal vertex u that is a reticulation vertex. Let p and p denote the two parents of u and assume that all vertices w in M =(ch(u) ∪ ch(p ) ∪ ch(p )) −{u} 1 2 have been assigned time-stamps t(w). Then, we put t(u)= t(p )= t(p )=1+ 1 2 max t(w). Since N is acyclic and in view of (c) this does not violate (TS1)– w∈M (TS3). Thus, our inductive construction yields a map t : V → R for which (TS1)– ≥0 (TS3) hold. As indicated in Fig. 3, a time-stamp function t on the vertices of a rooted X-cactus N =((V, A),ϕ) induces non-negative lengths on the arcs of N by putting the length of arc (u, v)tobe t(u) − t(v). With these arc lengths, all K. T. Huber et al. (a) (b) 4 4 3 3 2 2 1 1 0 0 a c e f j a c e b d h i b d Figure 4. a A ranking of size 4 of a rooted X-cactus with X = {a, b, c,... ,j}. Vertices of the same rank are drawn on the same horizontal line. b A ranking of a rooted, binary X- cactus with X = {a, b, c, d, e}. The ranking has size 4 which is the maximum size over all rooted, temporal X-cactuses with |X| =5 directed paths from a fixed vertex u toavertex w ∈ ϕ(X) have the same length, namely t(u). In view of this, we call an ordered pair (N ,t) consisting of a rooted, temporal X-cactus N and a time-stamp function t on the vertices of N an equidistant X-cactus. Thus, an equidistant X-cactus can be thought of as a rooted, temporal X-cactus with specific arc lengths assigned, whereas a rooted, temporal X-cactus does not have any specific arc lengths assigned. We conclude this section by shedding some more light on the combinato- rial structure of rooted, temporal X-cactuses. The size σ(t) of a time-stamp function t on the vertices of a rooted, temporal X-cactus N =((V, A),ϕ)is |t(V )|− 1. A ranking of a rooted, temporal X-cactus N =((V, A),ϕ)isa time-stamp function r on the vertices of N with r(V )= {0, 1, 2,... ,σ(r)}. See Fig. 4a for an example. Note that rankings as defined here are a particular type of temporal labeling and are more general than the rankings considered in [7]. The value r(v) assigned to vertex v by the ranking r will also be referred to as the rank of vertex v if the ranking referred to is clear from the context. A ranked X-cactus (N ,r) consists of a rooted, temporal X-cactus N and a ranking r of the vertices of N . The following lemma gives tight bounds on the size of rankings of rooted, temporal X-cactuses (see Fig. 4b for an exam- ple). For its proof, we will use the fact that any rooted binary X-cactus can be transformed into a rooted binary X-tree by deleting, for every reticulation vertex v, one of the arcs (p, v)fromaparent p of v to v and then suppressing the two internal vertices v and p. Lemma 2. Let (N ,r) be a ranked X-cactus. Then, we have 0 ≤ σ(r) ≤|X|− 1. Moreover, (a) σ(r)=0 if and only if N consists of a single vertex. (b) σ(r)= |X|− 1 if and only if N is a rooted, binary X-cactus and r(u) = r(v) for all distinct vertices u and v unless u and v are both leaves of N , u is a parent of a reticulation vertex v,or u and v are parents of the same reticulation vertex. The Space of Equidistant Phylogenetic Cactuses Proof. By definition, σ(r) ≥ 0. Moreover, if the size of the ranking r is pre- cisely 0 then N must consist of a single leaf v with r(v) = 0 and all elements of X are mapped by ϕ to v. To establish the upper bound, let i and k denote the number of internal and reticulation vertices, respectively, of the ranked X-cactus (N ,r). By def- inition, σ(r) ≤ (i − 2k). Note that, for fixed X, this expression can only be maximum if N is a rooted, binary X-cactus, because otherwise we can always increase i without increasing k. Hence, it suffices to show that for all rooted, binary X-cactuses we have i − 2k = |X|− 1. Since, as described above, we can transform any such X-cactus into a rooted binary X-tree, we immediately obtain this equation as a consequence of the well-known fact that a rooted binary X-tree has |X|− 1 internal vertices (see e.g. [30, Sec. 2.1]). 4. Equidistant X-Trees and τ -Space In this section, we shall briefly recall the concept of an orthant space (see e.g. [25, Sec. 6]) and related concepts. To illustrate the basic idea for constructing our orthant space of equidistant cactuses, we also consider the simpler case of equidistant trees (often called ultrametric trees) and explain how the τ -space of ultrametric trees mentioned in the introduction arises as an orthant space. This also yields an alternative proof to the one presented in [18] for the fact that τ -space is a CAT(0)-metric space. 4.1. Orthant Spaces An ordered pair (M, F ) consisting of a family F of non-empty subsets of a finite non-empty set M is called an abstract simplicial complex if A ∈F implies that all non-empty subsets of A are also contained in F . An abstract simplicial complex is a flag complex if, for all non-empty subsets A ⊆ M such that all two-element subsets of A are contained in F,wehave A ∈F.For every map ω : M → R , we put supp(ω)= {x ∈ M : ω(x) > 0}.The orthant ≥0 space associated with the abstract simplicial complex (M, F)is M = ω ∈ R : supp(ω) ∈ F ∪ {∅} . (M,F) ≥0 A metric D on a non-empty set B is a map D : B × B → R such that ≥0 • D(x, y) = 0 if and only if x = y, • D(x, y)= D(y, x), and • D(x, z) ≤ D(x, y)+ D(y, z) hold for all x, y, z ∈ B. The ordered pair (B, D) is called a metric space and the elements of B are called the points of the metric space. A metric D (M,F) on the orthant space M associated with the abstract simplicial complex (M,F) (M, F ) can be constructed as follows. For every A ∈F , the set O(A)= {ω ∈ M : supp(ω) ⊆ A} (M,F) K. T. Huber et al. is called an orthant of M . For all ω, ω ∈ M such that there exists (M,F) (M,F) an orthant O of M with {ω, ω }⊆ O we put (M,F) D (ω, ω )= (ω(x) − ω (x)) . (M,F) x∈M Then, for all ω, ω ∈ M such that there is no orthant O of M (M,F) (M,F) that contains both ω and ω we consider finite segmented paths from ω to ω . These are sequences ω ,ω ,ω ,...,ω of elements in M such that 0 1 2 k (M,F) ω = ω , ω = ω and, for all i ∈{1, 2,... ,k}, there exists some orthant O of 0 k i M that contains both ω and ω .The length of such a segmented path i−1 i (M,F) is D (ω ,ω ). Note that at least one such segmented path always (M,F) i−1 i i=1 exists in view of the fact that all orthants of M contain the point ω with (M,F) supp(ω)= ∅, called the origin of M . We define D (ω, ω )tobethe (M,F) (M,F) infimum of the length of all segmented paths from ω to ω .Itisknown (see [25, Sec. 6]) that this construction yields a metric space (M ,D ). (M,F) (M,F) Next, we describe a useful property that the metric space (M ,D ) (M,F) (M,F) may have. A geodesic path between the points p and q in a metric space (B, D) is a map γ :[0,] → B, for some ≥ 0, with γ(0) = p, γ()= q and D(γ(t ),γ(t )) = |t − t | for all t ,t ∈ [0,]. A metric space (B, D)is 1 2 1 2 1 2 geodesic if there exists a geodesic path between p and q for all p, q ∈ B.A geodesic metric space (B, D) is a CAT(0)-metric space if and only if (see e.g. [11, p. 163]) 2 2 2 2 (D(p, q)) +(D(p, r)) ≥ 2(D(m, p)) +(D(q, r)) /2 holds for all p, q, r ∈ B and all m ∈ B with D(q, m)= D(r, m)= D(q, r)/2. CAT(0)-metric spaces arise in many applications (see e.g. [3]). They have the important property that geodesic paths are unique [11, Proposition 1.4, p. 160]. It follows from a result in [19] that the orthant space (M ,D )isa (M,F) (M,F) CAT(0)-metric space if and only if F is a flag complex (see also [25,Proposi- tion 6.14]). Furthermore, geodesic paths can be computed in polynomial time in CAT(0)-orthant spaces [25, Corollary 6.19]. 4.2. τ -Space Revisited To describe how the τ -space of ultrametric trees arises as an orthant space, we start with a suitably defined abstract simplicial complex. A partition of X is a set P of non-empty and pairwise disjoint subsets of X with X = A.We A∈P denote the set of all partitions of X by B(X) and define a binary relation on B(X) by putting P P if for all A ∈P there exists some A ∈P with 1 2 1 1 2 2 A ⊆ A . Intuitively, this means that the partition P refines the partition P . 1 2 1 2 It is well-known that is a partial ordering. Note that the partial ordering is induced by the partial ordering ⊆ on the subsets of X. Every ranked X-tree with a ranking of size σ gives rise to a sequence ··· P = {X} 0 1 σ The Space of Equidistant Phylogenetic Cactuses (a) (b) ρ ρ 3 2.1 0.8 0 0.0 a c a c b d b Figure 5. a Aranked X-tree with X = {a, b, c, d}. Any cut along one of the dotted horizontal lines yields a partition of X (for example, the dotted line labeled with 1 yields the par- tition {{a, b}, {c}, {d}}). b An equidistant X-tree with X = {a, b, c} of partitions of X. In Fig. 5a, we depict a rooted X-tree with a ranking of size σ = 3 that gives rise to the sequence {{a}, {b}, {c}, {d}} {{a, b}, {c}, {d}} {{a, b}, {c, d}} {{a, b, c, d}} (see also Sect. 5.1 where we formally define how the partitions arise more gen- erally for ranked X-cactuses). The crucial fact is that this sequence encodes the ranked X-tree. More formally, as we shall prove as a consequence of our results for general ranked X-cactuses in Corollary 2,wehave: Theorem 1. There is a one-to-one correspondence between (isomorphism classes of) ranked X-trees and subsets of B(X) that contain {X} and that consist of partitions of X which are pairwise comparable with respect to the partial or- dering To obtain τ -space as an orthant space, we consider the abstract simplicial ◦ ◦ complex (B (X), F ( )) with B (X)= B(X) −{X} and F ( ) containing all non-empty subsets of B (X) whose elements are pairwise comparable with respect to . It follows immediately that (B (X), F ( )) is a flag complex. Note that, more generally, we can associate an abstract simplicial complex that is a flag complex to any partial ordering in an analogous way; for this reason such a complex is known as an order complex (see e.g. [36, p. 248]). In Fig. 6, we illustrate the orthant space M ◦ of equidistant (B (X),F()) X-trees for X = {a, b, c} (see Fig. 7 for an analogous drawing of the re- sulting orthant space of equidistant X-cactuses). Note that, by construction, the coordinates of a point in any orthant are obtained as differences be- tween consecutive time stamps in the equidistant X-tree that corresponds to the point. The equidistant X-tree in Fig. 5b, for example, corresponds to the point (ω ,ω ,ω ,ω )=(0.8, 1.3, 0, 0). More generally, it follows by 1 2 3 4 Theorem 1 that the elements in M ◦ are in one-to-one correspon- (B (X),F()) dence with equidistant X-trees. Moreover, since (B (X), F ( )) is a flag com- plex, it follows, as mentioned in Sect. 4.1, that the resulting metric space (M ◦ ,D ◦ ) is CAT(0). We remark that, by construction, (B (X),F()) (B (X),F()) K. T. Huber et al. a c ca b ac ab c abc Figure 6. The orthant space M ◦ for X = (B (X),F()) {a, b, c}. By construction, each axis represents a partition of X distinct from {X}. The axes labeled ω and ω , for exam- 1 2 ple, represent the partitions {{a}, {b}, {c}} and {{a, b}, {c}}, respectively. The three two-dimensional orthants are drawn shaded. All points in the interior of these two-dimensional orthants correspond to the same isomorphism class of binary ranked X-trees. The rankings for them are not shown because they are unique. Points on the axes correspond to non-binary ranked X-trees. The origin corresponds to the ranked X-tree that consists of a single vertex M ◦ is precisely τ -space, and so we obtain an alternative proof to (B (X),F()) the one presented in [18] that τ -space is a CAT(0)-metric space. Before proceeding, we note that in [21] the problem of when a partition of X is compatible with a rooted phylogenetic X-tree is studied. This includes, as a special case, the situation where the vertices of the tree can be ranked in such a way that the partition is among those associated with the resulting ranked X-tree. In addition, in [2] a space, called the Bergman fan of the matroid of the complete graph with vertex set X is studied. This space is a polyhedral fan and its points are also in one-to-one correspondence with equidistant X-trees. Although not an orthant space, its cones are in one-to- one correspondence with the orthants of M ◦ . (B (X),F()) 5. An Encoding for Ranked X-Cactuses To help the reader navigate the remaining sections of this paper, we now briefly summarize how we shall construct the equidistant-cactus space N(X) by applying an analogue of the process described in Sect. 4.2. We shall begin by introducing the concept of a polestar system on the set X, which is a collection of ordered pairs of subsets of X, or set pair system for short, with certain properties. As we shall see in Sect. 5.2, polestar systems can be associated to ranked X-cactuses in a similar way how partitions can The Space of Equidistant Phylogenetic Cactuses be associated to ranked X-trees. We shall also define a binary relation on general set pair systems, and, in Sect. 6, we will show that yields a partial ordering on the set P(X) of polestar systems on X. In Sect. 7, we then prove an analogue of Theorem 1, namely, we show that ranked X-cactuses are in one-to-one correspondence with subsets of P(X) that contain the maximum element relative to the ordering and that are pairwise comparable with respect to . In other words, we obtain an encoding of ranked X-cactuses in terms of certain collections of polestar systems. In Sect. 9, we conclude by constructing the network space N(X) as the orthant space associated to the order complex of the poset (P(X), ). 5.1. Set Pair Systems Before introducing polestar systems, we recall the concept of a set pair system introduced in [22]. To this end, we say that a vertex u in a rooted DAG N is a descendant of a vertex v if there exists a directed path from the root of N to u that contains v. A descendant u of v is a strict descendant if every directed path from the root to u contains v. Otherwise u is called a non-strict descendant of v. Now, given a rooted X-cactus N =((V, A),ϕ) and a vertex u ∈ V ,let C(u) be the set of those x ∈ X with ϕ(x) a descendant of u, S(u) the set of those x ∈ X with ϕ(x) a strict descendant of u and H(u) the set of those x ∈ X with ϕ(x) a non-strict descendant of u in X. For every vertex u of N , we call (S(u),H(u)) the set pair associated to u and put S(N)= {(S(u),H(u)) : u ∈ V }. For later reference, we state some immediate consequences of the definition of the set pairs in S(N ) for a rooted X-cactus N (see also [22] where these properties have been considered in the context of the slightly more restrictive 1-nested phylogenetic networks): (SH1) For all vertices u of N,wehave S(u) ∩ H(u)= ∅, S(u) ∪ H(u)= C(u) and S(u) is always non-empty while H(u) may be empty. (SH2) If (S(u),H(u)) = (S(v),H(v)) for two distinct vertices u and v of N then one of these vertices, say u, is a reticulation vertex with outde- gree 1 and v is the single child of u. Note that this situation cannot occur if N is compressed. (SH3) Let C be the set of vertices in a reticulation cycle of N where u and v are the common start and end vertex, respectively, of the two directed paths that form the reticulation cycle. Then we have H(w)= S(v)if w ∈ C −{u, v} and, for all other vertices w of N,wehave H(w ) = S(v). Now, given a ranked X-cactus (N =((V, A),ϕ),r) we collect, for every i ∈{0, 1, 2,... ,σ(r)},in S (N ) first those set pairs from S(N ) that correspond to vertices of rank at most i and whose parents (if any) have rank strictly larger than i. We then add some further set pairs that essentially help to keep track of the fact that some of the vertices involved are in a reticulation cycle. More formally, we define V to be the set that consists of all vertices u ∈ V with r(u) ≤ i and r(p) >i for all parents p of u. Note that, in view of (TS3), V does i K. T. Huber et al. not contain any reticulation vertices. Thus, all u ∈ V have at most one parent. Then, we put S (N)= {(S(u),H(u)) : u ∈ V }∪{(H(u), ∅): u ∈ V ,H(u) = ∅}. i i i Note that we always have S = {(X, ∅)}. For the rooted X-cactus N in σ(r) Fig. 4a, for example, we obtain: S (N)= {({a, b, c, d, e, f, g, h, i, j}, ∅)} S (N)= {({a, b, c, d, e}, ∅), ({f, g, h, i, j}, ∅)} S (N)= {({a}, {b, c, d}), ({b, c, d}, ∅), ({e}, {b, c, d}), ({f, g, h}, ∅), ({i}, ∅), ({j}, ∅)} S (N)= {({a}, ∅), ({b, c, d}, ∅), ({e}, ∅), ({f, g, h}, ∅), ({i}, ∅), ({j}, ∅)} S (N)= {({x}, ∅): x ∈{a, b, c, d, e, g, i, j}} ∪ {({f }, {g}), ({h}, {g})} A collection of ordered pairs (S, H) of subsets of X such that S = ∅ and S ∩ H = ∅ is called a set pair system on X. Note that, by construction, the sets S(N)and S (N ), 0 ≤ i ≤ σ(r), associated with a ranked X-cactus (N ,r) are non-empty set-pair systems. It is shownin[22] that, for any set pair system S on X, we obtain a partial ordering ≤ on the set pairs in S by putting (S ,H ) ≤ (S ,H )if 1 1 2 2 either (S ,H )=(S ,H )or(S ,H ) =(S ,H ) and one the following holds: 1 1 2 2 1 1 2 2 • S ∪ H ⊆ S 1 1 2 • S ∪ H ⊆ H 1 1 2 • S S and H = H = ∅ 1 2 1 2 We write (S ,H ) < (S ,H )if(S ,H ) ≤ (S ,H ) and the set pairs (S ,H ) 1 1 2 2 1 1 2 2 1 1 and (S ,H ) are distinct. The partial ordering ≤ on set pairs was defined in 2 2 such a way that we have (S(u),H(u)) ≤ (S(v),H(v)) for two vertices u and v in a rooted X-cactus if and only if u is a descendant of v (see the proof of Theorem 5 in [22]). We use the partial ordering ≤ on set pairs to define a binary relation on set pair systems. More precisely, for set pair systems S and S on X we 1 2 put S S if 1 2 (SP1) for all (S ,H ) ∈S , there exists some (S ,H ) ∈S with (S ,H ) ≤ 1 1 1 2 2 2 1 1 (S ,H ), and 2 2 (SP2) for all (S ,H ) ∈S with H = ∅, if there exists some (S ,H ) ∈S 2 2 2 2 1 1 1 with H = H , then there exists such a (S ,H ) with (S ,H ) ≤ 1 2 1 1 1 1 (S ,H ). 2 2 Again, we write S ≺S if S S and S = S . We remark that (SP1) 1 2 1 2 1 2 captures the basic idea from Sect. 4.2 that the partial ordering ≤ on set pairs induces a suitable binary relation on set pair systems (in analogy to how the partial ordering ⊆ induced the binary relation ). (SP2) is an additional technical requirement that will be crucial in our encoding of ranked X-cactuses. The relation is, in general, not a partial ordering on the set pair systems on a fixed set X, because it might neither be antisymmetric nor transitive. For the set pair systems associated with a ranked X-cactus, however, the following holds. The Space of Equidistant Phylogenetic Cactuses Lemma 3. Let (N ,r) be a ranked X-cactus. Then we have S (N ) ≺S (N ) for i j all 0 ≤ i<j ≤ σ(r) and S (N)= {(X, ∅)}. σ(r) Proof. As noted earlier in this section, S (N)= {(X, ∅)} follows immedi- σ(r) ately from the definition of the set pair system S (N ). Consider 0 ≤ i<j ≤ σ(r) σ(r). We first show that S (N ) S (N ). Therefore, consider (S, H) ∈S (N ). i j i By definition of S (N ), there must exist a vertex v in N with r(v) ≤ i, r(p) >i for all parents p of v, and either (S, H)=(S(v),H(v)) or (S, H)=(H(v), ∅). Consider a directed path from the root of N to v. On this path, there must exist a vertex u with r(u) ≤ j and r(p) >j for all parents p of u. This implies that (S(u),H(u)) ∈S (N ). Moreover, in view of the fact that u lies on a di- rected path from the root of N to v,wemusthave(S, H) ≤ (S(v),H(v)) ≤ (S(u),H(u)), as required by (SP1). To establish that also (SP2) is satisfied for S (N)and S (N ), consider i j (S, H) ∈S (N ) with H = ∅. By definition of S (N ), there must exist a vertex j j u in N with (S, H)=(S(u),H(u)), r(u) ≤ j and r(p) >j for all parents p of u.Now,ifthereexistssome (S ,H ) ∈S (N ) with H = H, then there exists some vertex v in N with (S ,H )=(S ,H)=(S(v),H(v)), r(v) ≤ i and r(p) >i for all parents p of v. This implies that u and v must be vertices in the same reticulation cycle of N . Moreover, we can choose v such that v is a descendant of u, implying that (S ,H )=(S(v),H(v)) ≤ (S(u),H(u)) = (S, H), as required. It remains to show that S (N ) = S (N ). By the definition of a ranked X- i j cactus, there must exist a vertex u ∈ V with r(u)= j. Without loss of general- ity, we may assume that u is not a reticulation vertex. If (S(u),H(u)) ∈S (N ) we are done. Therefore, assume for a contradiction that (S(u),H(u)) ∈S (N ). In view of i<j we have u ∈ V . Thus, there exists some v = u in V such i i that either (i) H(v) = ∅ and (S(u),H(u)) = (H(v), ∅) or (ii) (S(u),H(u)) = (S(v),H(v)). If Case (i) holds then, in view of (SH3), v must be a vertex in a reticulation cycle with end vertex u and (S(u ),H(u )) = (H(v), ∅)= (S(u),H(u)). Since u is not a reticulation vertex, it follows, by (SH2), that u is the single child of u . Consequently, i = r(v) >r(u)= j, a contradic- tion. Similarly, if Case (ii) holds then, again by (SH2), it follows that u is a reticulation vertex and v is the single child of u, a contradiction. 5.2. Polestar Systems A set pair system S on X is partition-like if (PL1) P(S)= {S :(S, H) ∈S} is a partition of X, (PL2) for all (S, H), (S ,H ) ∈S with (S, H) =(S ,H )wehave S = S ,and (PL3) for all (S, H) ∈S with H = ∅ we have (H, ∅) ∈S and there exists precisely one (S ,H ) ∈S with (S ,H ) =(S, H)and H = H . Apartition-like set pair system is called a polestar system, for short. In ad- dition, we define H(S)= {H :(S, H) ∈S,H = ∅}. Note that (PL2) implies that |S| = |P(S)|. Lemma 4. Let (N ,r) be a ranked X-cactus. Then, S (N ) is a polestar system for all 0 ≤ i ≤ σ(r). K. T. Huber et al. Proof. Fix some i ∈{0, 1,... ,σ(r)} and consider two distinct vertices u ,u ∈ 1 2 V . Put (S ,H )=(S(u ),H(u )), k ∈{1, 2}. Recall from the definition of i k k k k the set V that both u and u have rank at most i while the ranks of their i 1 2 parents are strictly larger than i. Thus, up to switching the roles of u and u , 1 2 one of the following must hold: • Neither of u and u is a descendant of the other and there is no reticu- 1 2 lation cycle in N that contains both u and u . Consequently, (S(u ) ∪ 1 2 1 H(u )) ∩ (S(u ) ∪ H(u )) = ∅. Thus, the sets S(u ), H(u ), S(u )and 1 2 2 1 1 2 H(u ) are pairwise disjoint. • Both u and u are contained in the same reticulation cycle in N but 1 2 neither is a descendant of the other. Consequently, H(u )= H(u )= 1 2 H = ∅ and the sets S(u ), S(u )and H are pairwise disjoint. 1 2 It follows from this case analysis that (PL1) and (PL2) hold for S (N ). To see that also (PL3) holds, consider a set pair (S, H) ∈S (N ) with H = ∅. By the definition of S (N ), there must exist a vertex u in N with (S(u),H(u)) = (S, H) such that r(u) ≤ i and r(p) >i for all parents p of u. In view of H(u)= H = ∅, vertex u must be contained in a reticulation cycle C but cannot be the common start or the common end vertex of the two directed paths that form C. Note that C contains a unique vertex v = u with r(v) ≤ i and r(p) >i for all parents p of v.Moreover, v cannot be the common start or the common end vertex of the two directed paths that form C. Since u and v are both contained in C,wehave H(u)= H(v)= H. Moreover, by (SH3), there are no other vertices w in N with H(w)= H, r(w) ≤ i and r(p) >i for all parents p of w. Finally, by construction, we also have (H, ∅)=(H(u), ∅) ∈S (N ). We denote by P(X) the set of polestar systems on the set X. Note that, even for the set pair systems in P(X), (SP1) in the definition of the binary relation does not imply (SP2), as can be seen from the set pair systems S = {({a}, {b}), ({b}, ∅), ({c}, {b}), ({d}, ∅)} and S = {({a, c}, {b}), ({b}, ∅), ({d}, {b})} on X = {a, b, c, d} which satisfy (PL1)–(PL3) and (SP1) but not (SP2). We conclude this section with two technical lemmas stating some proper- ties of the relations ≤ and that will be used in Sects. 6 and 7. In particular, Lemma 5 establishes that, up to a specific exception, distinct set pairs within a single polestar system are incomparable with respect to the partial order- ing ≤ and the binary relations ≤ and are consistent. In our encoding of ranked X-cactuses, this exception corresponds to the set pairs associated with reticulation vertices. Lemma 5. Let S , S ∈ P(X) with S S .Then, forall (S ,H ) ∈S 1 2 1 2 1 1 1 and (S ,H ) ∈S , (S ,H ) < (S ,H ) implies (S ,H ) ∈S , H = ∅ and 2 2 2 2 2 1 1 2 2 1 2 H = S . 1 2 Proof. First, consider the case S = S = S. Let (S ,H ), (S ,H ) ∈S with 1 2 1 1 2 2 (S ,H ) < (S ,H ). Assume for a contradiction that H = ∅. Then, in view 2 2 1 1 2 of (PL1)–(PL3), none of S ∪ H ⊆ S , S ∪ H ⊆ H and S S can 2 2 1 2 2 1 2 1 The Space of Equidistant Phylogenetic Cactuses hold, in contradiction to (S ,H ) < (S ,H ). Thus, we must have H = ∅. 2 2 1 1 2 Consequently, S ⊆ H , and, therefore, S = H , as required. 2 1 2 1 Next, consider the case S ≺S . Let (S ,H ) ∈S and (S ,H ) ∈S with 1 2 1 1 1 2 2 2 (S ,H ) < (S ,H ). In view of S ≺S , there must exist some (S ,H ) ∈S 2 2 1 1 1 2 2 2 2 with (S ,H ) ≤ (S ,H ). By the transitivity of ≤, we obtain (S ,H ) < 1 1 2 2 2 2 (S ,H ). In view of the first case considered in this proof, this implies S = H 2 2 2 and H = ∅. Thus, by the definition of a set pair, we have S ∩S = ∅.Moreover, 2 2 (S ,H ) < (S ,H ) ≤ (S ,H ) simplifies to (S , ∅) < (S ,H ) ≤ (S ,S ). In 2 2 1 1 2 1 1 2 2 2 2 view of the definition of ≤, the latter can only hold if S = H = ∅. By (PL3), 2 1 this implies (H , ∅)=(S ,H ) ∈S , as required. 1 2 2 1 Lemma 6. Let S , S ∈ P(X) with S ≺S .Then, 1 2 1 2 1 ≤|P(S )|−|H(S )| < |P(S )|−|H(S )|≤|X|. 2 2 1 1 If (|P(S )|−|H(S )|) − (|P(S )|−|H(S )|) ≥ 2 then there exists S ∈ P(X) 1 1 2 2 3 with S ≺S ≺S . 1 3 2 Proof. In view of (PL1), we have 1 ≤|P(S)|≤|X| for all S∈ P(X). Moreover, in view of (PL3), we have |P(S)|≥ 3|H(S)|. This implies 1 ≤|P(S)|−|H(S)|≤ |X|. Next consider S , S ∈ P(X) with S ≺S . We first show that, for all 1 2 1 2 S ∈P(S ), there exists a unique S ∈P(S ) with S ⊆ S . In view of (PL2), 1 2 there exists a unique set pair (S ,H ) ∈S with S = S and, in view of 1 1 1 1 S ≺S , there must exist a set pair (S ,H ) ∈S with (S ,H ) ≤ (S ,H ). 1 2 2 2 2 1 1 2 2 Therefore, by the definition of ≤, one of the following must hold: • S ∪ H ⊆ S . Then, we put S = S . 1 1 2 2 • S ∪ H ⊆ H . This implies H = ∅ and thus, by (PL3), H ∈P(S ). We 1 1 2 2 2 2 put S = H . • S S and H = H = ∅. Then, we put S = S . 1 2 1 2 2 In each case, we have S ⊆ S for some S ∈P(S ) and, in view of (PL1), S is unique, as claimed. This implies that we obtain a map q : S →S by 1 2 assigning to each (S ,H ) ∈S the unique (S ,H ) ∈S with S ⊆ S .In 1 2 particular, we have |P(S )|≥|P(S )|. 1 2 To establish |P(S )|−|H(S )| < |P(S )|−|H(S )|, put k = |P(S )|−|P(S )|. 2 2 1 1 1 2 Let denote the number of H ∈H(S ) with H ∈H(S ). Note that, in view of 1 1 2 (PL3), for each such H , there exist precisely two set pairs (S ,H ), (S ,H ) ∈ 1 1 2 2 S with H = H = H and, in view of S ≺S , there must exist some 1 1 2 1 2 S ∈P(S ) with S ∪ S ∪ H ⊆ S . This implies k ≥ 2 . Thus, letting 2 1 2 1 2 denote the number of H ∈H(S ) with H ∈H(S ), we have 2 1 |P(S )|−|H(S )| = |P(S )|−|H(S )| + − 2 2 2 1 1 2 = |P(S )|−|H(S )| + − − k 1 1 1 2 ≤|P(S )|−|H(S )|− − . 1 1 1 2 Thus, if + > 0, we immediately have |P(S )|−|H(S )| < |P(S )|−|H(S )|. 1 2 2 2 1 1 If + =0 we have H(S )= H(S ). This implies, in view of S ≺S , that 1 2 1 2 1 2 we cannot have P(S )= P(S ), that is, we must have k> 0 and, thus, we also 1 2 obtain |P(S )|−|H(S )| < |P(S )|−|H(S )|, as required. 2 2 1 1 K. T. Huber et al. Now assume that (|P(S )|−|H(S )|) − (|P(S )|−|H(S )|) ≥ 2. First 1 1 2 2 consider the case that there exist two distinct (S ,H ), (S ,H ) ∈S with 1 1 2 2 −1 |q (S ,H )|≥ 2, i ∈{1, 2}. Then we put i i −1 S =(S − q (S ,H )) ∪{(S ,H )}. 3 1 1 1 1 1 −1 Next consider the case that there exists (S ,H ) ∈S with |q (S ,H )|≥ −1 3and H = ∅ for all (S ,H ) ∈ q (S ,H ). Then we select two distinct −1 (S , ∅), (S , ∅) ∈ q (S ,H ) and put 1 2 S =(S −{(S , ∅), (S , ∅)}) ∪{(S ∪ S , ∅)}. 3 1 1 2 1 2 The remaining case to consider is that there exists a set pair (S ,H ) ∈S such −1 that |q (S ,H )|≥ 4 and there are three distinct (S ,H ), (S ,H ), (S ,H ) ∈ 1 1 2 2 3 3 −1 q (S ,H ) with H = ∅ and H = H = S . Then, we put 1 2 3 1 S =(S −{(S ,H ), (S ,H ), (S ,H )}) ∪{(S ∪ S ∪ S , ∅)}. 3 1 1 1 2 2 3 3 1 2 3 In each case, by construction, we immediately have S ≺S ≺S . 1 3 2 6. The Poset (P(X), ) In this section, we prove that is a partial ordering on P(X). We also give a formula for counting the number of elements in the resulting poset (P(X), ). We first recall some standard poset concepts (see e.g. [35]). A (finite) poset (M, R) consists of a finite non-empty set M and a binary relation R ⊆ M × M on M that is reflexive, transitive and antisymmetric. An element m ∈ M is minimum (maximum)if(m, a) ∈ R ((a, m) ∈ R) holds for all a ∈ M.Aposet is bounded if it has a minimum and a maximum element and these elements are then necessarily unique. Two elements a, b ∈ M are comparable if (a, b) ∈ R or (b, a) ∈ R.A chain C is a non-empty subset of M of pairwise comparable elements. The length of a chain C is |C|− 1. A chain is maximal if it is not contained in some strictly longer chain. A poset is graded if every maximal chain has the same length. The height function h of agraded poset(M, R) assigns to every element a ∈ M the length h(a) of a longest chain C with (b, a) ∈ R for all b ∈ C. Proposition 1. (P(X), ) is a bounded graded poset with minimum element {({x}, ∅): x ∈ X} and maximum element {(X, ∅)}. The height function of this poset is h : P(X) →{0, 1,... , |X|− 1} with h(S)= |X|−|P(S)| + |H(S)|. Proof. We first show that (P(X), ) is a poset. It follows immediately from the definition of the binary relation that it is reflexive. Moreover, in view of Lemma 6, we cannot have two distinct S , S ∈ P(X) with S S and 1 2 1 2 S S , implying that is also antisymmetric. 2 1 It remains to show that is transitive. Consider set pair systems S , S and S with S S S . Then, in view of (SP1), for all (S ,H ) ∈S , 2 3 1 2 3 1 1 1 there exists some (S ,H ) ∈S with (S ,H ) ≤ (S ,H ) and, again in view of 2 2 2 1 1 2 2 Usually called rank function of the graded poset. We use height function instead to avoid confusion with the rankings of rooted X-cactuses. The Space of Equidistant Phylogenetic Cactuses (SP1), there also exists some (S ,H ) ∈S with (S ,H ) ≤ (S ,H ). By the 3 3 3 2 2 3 3 transitivity of ≤, we obtain (S ,H ) ≤ (S ,H ), as required. 1 1 3 3 Next consider some (S ,H ) ∈S with H = ∅. First assume that there 3 3 3 3 exists some (S ,H ) ∈S with H = H . Then, by (SP2), there also exists 2 2 2 2 3 (S ,H ) ∈S with H = H and (S ,H ) ≤ (S ,H ). Now, if there exists 2 2 2 2 3 2 2 3 3 some (S ,H ) ∈S with H = H = H , then, by (SP2), there also exists such 1 1 1 1 2 3 a set pair in S with (S ,H ) ≤ (S ,H ) ≤ (S ,H ). Hence, by the transitivity 1 1 1 2 2 3 3 of ≤,wehave(S ,H ) ≤ (S ,H ), as required. 1 1 3 3 Next assume that there exists no (S ,H ) ∈S with H = H . It suffices 2 2 2 2 3 to show that this implies that there exists no (S ,H ) ∈S with H = H . 1 1 1 1 3 So, assume for a contradiction that there exists some (S ,H ) ∈S with 1 1 1 H = H = ∅.Put H = H .Inviewof S S , there must exist some 1 3 1 1 2 (S ,H ) ∈S with (S ,H) ≤ (S ,H ). Note that H = H combined with the 2 2 2 1 2 2 2 definition of ≤ implies S ∪ H ⊆ S or S ∪ H ⊆ H . Moreover, in view of 1 2 1 2 S S , there must exist some (S ,H ) ∈S with (S ,H ) ≤ (S ,H ). This 2 3 3 2 2 3 3 3 3 implies that S ∪ H ⊆ S or S ∪ H ⊆ H . But then, H S or H H must 1 1 3 3 3 3 hold in contradiction to (PL1). Thus, S S holds, establishing that is 1 3 transitive and, thus, (P(X), )isaposet. Next, we show that {({x}, ∅): x ∈ X} and {(X, ∅)} are the minimum and maximum element, respectively, in (P(X), ). Clearly, {({x}, ∅): x ∈ X} and {(X, ∅)} are both polestar systems and, thus, elements of P(X). Consider any S∈ P(X). Then, for all (S, H) ∈S,wehave S ∪ H ⊆ X, implying (S, H) ≤ (X, ∅) and, thus, S {(X, ∅)}. Similarly, in view of (PL1), for all x ∈ X, there must exist some (S, H) ∈S with x ∈ S, implying that ({x}, ∅) ≤ (S, H). Thus, {({x}, ∅): x ∈ X}S. It follows that (P(X), )is a bounded poset. That (P(X), ) is a graded poset with height function h is now an im- mediate consequence of Lemma 6 in view of h({({x}, ∅): x ∈ X})=0 and h({(X, ∅)})= |X|− 1. The next corollary describes the relationship between (P(X), ) and the poset (B(X), ) of partitions of X.Two posets (M ,R ) and (M ,R )are 1 1 2 2 isomorphic if there exists a bijective map f : M → M such that, for all 1 2 a, b ∈ M ,(a, b) ∈ R if and only if (f (a),f (b)) ∈ R . 1 1 2 Corollary 1. The restriction of the poset (P(X), ) to those S∈ P(X) with H(S)= ∅ is isomorphic to the poset (B(X), ) of partitions of X. Proof. We map any S∈ P(X) with H(S)= ∅ to the partition P(S) ∈ B(X). This map is bijective. Moreover, for S , S ∈ P(X) with H(S )= H(S )= ∅ 1 2 1 2 we have S S if and only if for all A ∈P(S ) there exists some A ∈P(S ) 1 2 1 1 2 2 with A ⊆ A , as required. 1 2 In the remaining part of this section, we give a formula for the num- ber λ = |P(X)| of polestar systems on a set X with n ≥ 1 elements. The values of λ for n =1, 2,... , 8 are 1, 2, 8, 45, 277, 1853, 14065, 122118. For k ∈{1, 2,... ,n}, we denote by α the Stirling number of the second n,k kind, that is, the number of partitions of X into k subsets. In addition, for K. T. Huber et al. ∈{0, 1,... , }, we denote by β the number of partitions of a set with k k, elements into subsets with three elements and k−3 subsets with one element. It is known [31] that k! β = . k, 6 · ! · (k − 3)! Proposition 2. For all n ≥ 1 we have ⎛ ⎞ ⎝ ⎠ λ = α · β · 3 . (1) n n,k k, k=1 =0 Proof. Let X be a set with n ≥ 1 elements. Consider S∈ P(X) and put k = |P(S)|. By the definition of a polestar system, S arises from P(S)by forming, for some ∈{0, 1,... , }, a partition Π(P(S)) of P(S)into subsets with three elements and k − 3 subsets with one element. Each 1- element set {S}∈ Π(P(S)) yields the set pair (S, ∅). For each 3-element set {S ,S ,S }∈ Π(P(S)) we select i ∈{1, 2, 3} and obtain the three set pairs 1 2 3 (S , ∅), (S ,S ), j ∈{1, 2, 3}−{i}. i j i Formula (1) directly reflects the process described above for obtaining a polestar system from a fixed partition of X into k subsets. In view of the fact that every partition of X yields a different collection of polestar systems on X, we form the outer sum over the values of k. The inner sum then accounts for the number of polestar systems that arise from any fixed partition of X into k subsets. 7. Encoding Ranked X-Cactuses In this section, we show in Theorem 2 that we can encode (isomorphism classes) of ranked X-cactuses in terms of the chains in the poset (P(X), ). We begin by giving a precise statement of this result. We call two equidistant X-cactuses (N =((V ,A ),ϕ ),t ) and (N =((V ,A ),ϕ ),t ) isomorphic if there exists a DAG-isomorphism f : V → V such that (IC1) f (ϕ (x)) = ϕ (x) for all x ∈ X and (IC2) t (v)= t (f (v)) for all v ∈ V . Note that this definition includes isomorphisms between ranked X-cactuses as a special case. For rooted X-cactuses without a time-stamp function to be isomorphic, condition (IC2) is not required. We now state the aforementioned result. Theorem 2. There is a one-to-one correspondence between chains in the poset (P(X), ) that contain the maximum element {(X, ∅)} and (isomorphism classes of) ranked X-cactuses. The length of the chain equals the size of the ranking of the corresponding ranked X-cactus. Maximal chains correspond to binary ranked X-cactuses with rankings of size |X|− 1. The Space of Equidistant Phylogenetic Cactuses To prove this theorem, note that by Lemmas 3 and 4, every ranked X- cactus corresponds to a chain C in (P(X), ) with {(X, ∅)}∈ C.Moreover, by Lemma 2,wehave |C|≤|X|−1 for such a chain with equality holding if and only if the ranked X-cactus is binary. Thus, to prove Theorem 2, it suffices to show that for all chains C in (P(X), ) with {(X, ∅)}∈ C there exists, up to iso- morphism, a unique ranked X-cactus (N ,r) with C = {S (N):0 ≤ i ≤ σ(r)}. This follows immediately from Lemmas 7 and 8 below, and will be done in two steps. First, for any chain C ⊆ P(X) with {(X, ∅)}∈ C, we form the set pair system S(C)= S consisting of all set pairs that occur in the polestar S ∈C systems in C and construct a suitable rooted, compressed, phylogenetic X- cactus N (C) (see Lemma 7). Second, we perform some technical modifications on N (C), if necessary, to obtain N and then construct a suitable ranking r (see Lemma 8). Lemma 7. For all chains C in (P(X), ) with {(X, ∅)}∈ C there exists, up to isomorphism, a unique rooted, compressed, phylogenetic X-cactus N (C) with S(N (C)) = S(C) ∪{({x}, ∅): x ∈ X}. Proof. Put S = S(C) ∪{({x}, ∅): x ∈ X}. We show below that S satisfies certain properties (NC1)–(NC5). We do this to then apply [22, Theorem 5], which states that if a set pair system S on X has these properties there exists, up to isomorphism, a unique rooted, compressed, phylogenetic X-cactus N (S ) with S = S(N (S )), as required. In the following, we first state each of the properties (NC1)–(NC5) and then verify that S has this property. (NC1)—(X, ∅) ∈S: This is clearly the case. (NC2)—({x}, ∅) ∈S, for all x ∈ X: By construction of S, this is the case. (NC3)—For every (S, H) ∈S with H = ∅,wehave(H, ∅) ∈S: Consider any (S, H) ∈S with H = ∅. Then, by construction, there must exist some S ∈ C with (S, H) ∈S . In view of (PL3) we must have (H, ∅) ∈S . Thus, by the definition of S, it follows that (H, ∅) ∈S, as required. (NC4)—For any two distinct (S ,H ), (S ,H ) ∈S one of (i) (S ,H ) < 1 1 2 2 1 1 (S ,H ), (ii) (S ,H ) < (S ,H ), (iii) (S ∪H )∩(S ∪H )= ∅,or(iv) S ∩S = 2 2 2 2 1 1 1 1 2 2 1 2 ∅ and H = H = ∅ holds: 1 2 Consider (S ,H ), (S ,H ) ∈S with (S ,H ) =(S ,H ). By construc- 1 1 2 2 1 1 2 2 tion, there must exist S , S ∈ C with (S ,H ) ∈S and (S ,H ) ∈S . 1 2 1 1 1 2 2 2 Without loss of generality, we may assume that S S . 1 2 First we consider the case S = S . Then, in view of (PL1) and (PL2), 1 2 we have S ∩ S = ∅.Thus, if H = H = ∅, we are done. Otherwise, in view of 1 2 1 2 (PL3) and (PL1), we must have H ∩H = ∅ and, thus, (S ∪H )∩(S ∪H )= ∅, 1 2 1 1 2 2 as required. Next consider the case S ≺S . Then there must exist some (S, H) ∈S 1 2 2 with (S ,H ) ≤ (S, H). If (S, H)=(S ,H ) we immediately have (S ,H ) ≤ 1 1 2 2 1 1 (S ,H ) and are done. Therefore, assume (S, H) =(S ,H ). In view (PL2), 2 2 2 2 this implies S ∩ S = ∅. Thus, by the definition of ≤ one of the following must hold: K. T. Huber et al. • S ∪ H ⊆ S: Then, by the definition of set pairs, (S ∪ H ) ∩ H = ∅ and, 1 1 1 1 in view of S ∩ S = ∅, also (S ∪ H ) ∩ S = ∅.Thus, if S ∩ H = ∅ we 2 1 1 2 2 have (S ∪ H ) ∩ (S ∪ H )= ∅. Therefore, assume that S = H . Then, 1 1 2 2 2 we have S ∪ H ⊆ H implying (S ,H ) < (S ,H ). 1 1 2 1 1 2 2 • S ∪ H ⊆ H: Then, if H = H or H = S , we immediately have 1 1 2 2 (S ,H ) < (S ,H ). Otherwise, we must have H ∩H = ∅ and H ∩S = ∅ 1 1 2 2 2 2 and, thus, (S ∪ H ) ∩ (S ∪ H )= ∅. 1 1 2 2 • S S and H = H = ∅: First note that this implies S ∩ H = ∅ because 1 1 2 otherwise we would have (S, ∅) ∈S in view of (PL3), which is impossible in view of (S, H) ∈S and (PL2). Also note that if H = S we must have 2 2 (S ,H )=(H, ∅) in view of (PL2), implying that (S ,H ) < (S ,H ). 2 2 1 1 2 2 Finally, if H ∩ S = ∅, we obtain (S ∪ H ) ∩ (S ∪ H )= ∅. 2 1 1 2 2 This establishes that S satisfies (NC4). (NC5)—There are no three distinct (S ,H ), (S ,H ), (S ,H ) ∈S with 1 1 2 2 3 3 H = H = H = ∅, S ∩ S = ∅ and either S ∪ S ⊆ S or (S ∪ S ) ∩ S = ∅: 1 2 3 1 2 1 2 3 1 2 3 Consider S , S , S ∈ C with S S S . Assume that there exist set pairs 1 2 3 1 2 3 (S ,H), (S ,H) ∈S ,(S ,H), (S ,H) ∈S and (S ,H), (S ,H) ∈S with 1 2 3 1 1 2 2 3 3 H = ∅. By (PL3), there are precisely these two set pairs contained in each of S , S and S for the fixed set H. In view of (SP2), we may assume without loss of 2 3 generality that (S ,H) ≤ (S ,H) ≤ (S ,H) and (S ,H) ≤ (S ,H) ≤ (S ,H), 1 2 3 1 2 3 implying that we have S ⊆ S ⊆ S and S ⊆ S ⊆ S . But then, it 1 2 3 1 2 3 is impossible to select three distinct set pairs (S ,H), (S ,H), (S ,H)from 1 2 3 among (S ,H), (S ,H), (S ,H), (S ,H), (S ,H), (S ,H) 1 1 2 2 3 3 with S ∩ S = ∅ and either S ∪ S ⊆ S or (S ∪ S ) ∩ S = ∅. This establishes 1 2 1 2 3 1 2 3 that S satisfies (NC5). Recall from Sect. 2 that, for every rooted X-cactus N , we denote by N the associated rooted, compressed, phylogenetic X-cactus. Lemma 8. For all chains C in (P(X), ) with {(X, ∅)}∈ C, there exists, up to isomorphism, a unique ranked X-cactus (N ,r) such that N = N (C) and C = {S (N):0 ≤ i ≤ σ(r)}. ∗ ∗ ∗ Proof. Let N (C)=((V , A ), ϕ ) be the rooted, compressed, phylogenetic X- cactus that exists by Lemma 7. To obtain a rooted X-cactus N =((V, A),ϕ) with S(N)= S(C), we take N (C) and modify it. The first modification applies to all x ∈ X with ({x}, ∅) ∈S(C) and corresponds to reversing the addition of leaves that was illustrated in Fig. 2b. For each such x, we contract the arc ∗ ∗ (u, v) ∈ A with v = ϕ (x) and put ϕ(x)= u. The second modification applies to all set pairs (S, ∅) ∈S(C) such that (S, ∅) ∈S ∩S for S , S ∈ C with 1 2 1 2 S ≺S , S ∈H(S )and S ∈H(S ). This implies, in view of the definition of 1 2 1 2 the polestar systems S (N ), 0 ≤ i ≤|X|− 1, that we need to modify N (C) to ensure that N contains two distinct vertices u and v with (S(u),H(u)) = (S(v),H(v)) = (S, ∅). This corresponds to reversing the compression that was illustrated in Fig. 2c. Thus, in view of (SH2), for each such set pair (S, H), The Space of Equidistant Phylogenetic Cactuses we locate the vertex u ∈ V with (S(u),H(u)) = (S, ∅) and then expand the vertex u into an arc (u, v) such that the outgoing arcs of u become the outgoing arcs of v and, for all x ∈ X with u = ϕ (x), we put ϕ(x)= v. Note that the resulting rooted X-cactus N need no longer be phylogenetic or compressed and that for all ranked X-cactuses (N ,r ) with C = {S (N ): 0 ≤ i ≤ σ(r) } we necessarily have that N is isomorphic to N in view of the fact that N and N must be isomorphic by Lemma 7. Thus, it remains to show that there exists a unique ranking r of the vertices of N to obtain a ranked X-cactus (N ,r) with C = {S (N):0 ≤ i ≤ σ(r)}.Let c denote the length of C and consider the sequence S ≺S ≺ ··· ≺ 0 1 S of the polestar systems in C. The value r(u) for a vertex u ∈ V that is not a reticulation vertex is defined by considering the set pair (S(u),H(u)) and putting r(u) to be the smallest index 0 ≤ i ≤ c with (S(u),H(u)) ∈S . Note that this is the only available choice for the rank of u. The value r(u)ofa reticulation vertex u is defined to be equal to the rank of the parents of u, which, since N is a rooted X-cactus, cannot be reticulation vertices and have been assigned a rank already. Next, we show that the map r : V →{0, 1,... ,c} defined above is a rank- ing of the vertices of N . First note that the value r(u) of a reticulation vertex u is well-defined. Indeed, in view of (SH3), we must have r(p )= r(p ) for the two 1 2 parents p and p of u, that is, the set pairs (S(p ),H(p )) and (S(p ),H(p )) 1 2 1 1 2 2 with H(p )= H(p )= H are both contained in the polestar system S with 1 2 i the smallest index i such that H ∈H(S ). This establishes (TS3). To establish (TS1), consider any x ∈ X. By (PL1) there exists a unique set pair (S, H) ∈S with x ∈ S. Then, by Lemma 5, it suffices to consider the following two cases: • There is precisely one (S ,H ) ∈S(C) with (S ,H ) < (S, H). Then, we must have (S ,H ) ∈S , H = ∅ and H = S . This implies that there exists a reticulation vertex u in N that is a leaf with (S(u),H(u)) = (S ,H ) and that u is the single child of a vertex p with (S(p),H(p)) = (S, H). Since x ∈ S and S ∩ S = ∅,wehave ϕ(x)= p. By construction, we have r(p) = 0, as required. • There is no (S ,H ) ∈S(C) with (S ,H ) < (S, H). Then, there exists a leaf u of N with (S(u),H(u)) = (S, H ) and we must have ϕ(x)= u. Again, by construction, we have r(u) = 0, as required. Now, we turn to (TS2). Consider an arc (u, v)of N such that v is not a reticulation vertex. As mentioned in Sect. 5.1, since v is a descendant of u,we have (S(v),H(v)) ≤ (S(u),H(u)). If (S(v),H(v)) = (S(u),H(u)) then, by the construction of N from N (C), u is a reticulation vertex whose single child is v andthereexist 0 ≤ i<j ≤ c with r(v)= i and r(u)= r(p )= r(p )= j, 1 2 where p and p are the two parents of u. Similarly, in view of Lemma 5,if 1 2 (S(v),H(v)) < (S(u),H(u)) there also exist 0 ≤ i<j ≤ c with r(v)= i and r(u)= j. This establishes (TS2). The last property required for the map r to be a ranking is that, for all j ∈{0, 1,... ,c}, there exists a vertex u of N with r(u)= j. (TS1) implies that K. T. Huber et al. this is the case for j = 0. Therefore, consider j ≥ 1. Then, in view of Lemma 6, there exists some (S, H) ∈S with (S, H) ∈S for all i<j.Let u be a vertex j i of N with (S(u),H(u)) = (S, H). If u is not a reticulation vertex, we have r(u)= j.If u is a reticulation vertex, we have r(u)= r(p )= r(p )= j for the 1 2 two parents p and p of u since, by (PL3), (S(p ),H(p )) and (S(p ),H(p )) 1 2 1 1 2 2 are also both contained in S but not in S for all i<j. j i To finish the proof of the lemma, we show that S = S (N ) for all j ∈ j j {0, 1,... ,c}. We clearly have S = {(X, ∅)} = S (N ). Consider j< c.Inview c c of Lemma 4, (PL1) and (PL2) it suffices to show that, for all u ∈ V ,wehave (S(u),H(u)) ∈S .Let p be the unique parent of u. By the definition of V j j given in Sect. 5.1,wehave r(u)= i ≤ j and r(p)= k> j. In particular, we have (S(u),H(u)) ∈S and (S(p),H(p)) ∈S .Inviewof S S ≺S i k i j k there must exist (S ,H ) ∈S with (S(u),H(u)) ≤ (S ,H ) and also some (S ,H ) ∈S with (S ,H ) ≤ (S ,H ). Since p is the parent of u we have (S(u),H(u)) ≤ (S(p),H(p)) and, since all set pairs in S(C) correspond to at least one vertex of N , we must necessarily have (S ,H )=(S(p),H(p)). It follows that either (S(u),H(u)) = (S ,H )=(S(p),H(p)) or (S(u),H(u)) = (S ,H ) < (S(p),H(p)) holds, implying (S(u),H(u)) ∈S , as required. As an immediate consequence of Theorem 2 we obtain Theorem 1,which we restate in the following corollary using poset terminology. Corollary 2. There is a one-to-one correspondence between chains in the graded poset (B(X), ) that contain {X} and isomorphism classes of ranked X-trees. Proof. In view of the fact that a rooted X-cactus N is a rooted X-tree if and only if the associated set pair system S(N ) does not contain a set pair (S, H) with H = ∅, it follows by Theorem 2 that ranked X-trees correspond to chains C in the poset (P(X), ) with {(X, ∅)}∈ C and H(S)= ∅ for all S∈ C. This implies, by Corollary 1, that ranked X-trees correspond to chains in the poset (B(X), ) that contain the partition {X}. 8. The Space of Equidistant X-Cactuses We now define equidistant-cactus space, N(X), and show that it is a CAT(0)- metric space. The construction of N(X) follows the outline presented at the start of Sect. 5. More specifically, we put P (X)= P(X) −{{(X, ∅)}} and let F () denote the set of chains in the subposet (P (X), ) of the poset (P(X), ). We then define N(X) to be the orthant space of the order complex of (P (X), ). Figure 7 gives an example of the structure of N(X)for X = {a, b, c}. ◦ ◦ Theorem 3. The orthant space N(X)=(M ,D ) is a (P (X),F( )) (P (X),F( )) CAT(0)-metric space whose points are in one-to-one correspondence with iso- morphism classes of equidistant X-cactuses. Proof. As an immediate consequence of the definition of a chain as a set of pairwise comparable elements in a poset, we have that (P (X), F ()) is a flag The Space of Equidistant Phylogenetic Cactuses {({a}, ∅), ({b}, ∅), ({c}, ∅)} a c ac b b ca a c b b ac ac b b {({a, b}, ∅), ({c}, ∅)} {({a}, {c}), ({b}, {c}), ({c}, ∅)} {({b, c}, ∅), ({a}, ∅)} {({a}, {b}), ({c}, {b}), ({b}, ∅)} {({a, c}, ∅), ({b}, ∅)} {({b}, {a}), ({c}, {a}), ({a}, ∅)} Figure 7. The structure of N(X)for X = {a, b, c}.The six two-dimensional orthants are drawn shaded. Each of these two-dimensional orthants corresponds to an isomor- phism class of binary ranked X-cactuses. Each axis corre- sponds to the indicated polestar system on X complex (cf. Section 4.1). Hence, (M ◦ ,D ◦ ) is a CAT(0)- (P (X),F( )) (P (X),F( )) metric space. It remains to show that the points of M ◦ are in one-to-one (P (X),F( )) correspondence with isomorphism classes of equidistant X-cactuses. Every ω ∈ M ◦ corresponds, up to isomorphism, to a unique equidistant X- (P (X),F( )) cactus (N ,t) as follows. Put σ = |supp(ω)| and C = supp(ω) ∪{{(X, ∅)}}. Note that C is a chain in the poset (P(X), ). Consider the sequence S ≺S ≺S ≺ ··· ≺ S = {(X, ∅)} 0 1 2 σ of the set pair systems in C. By Theorem 2, there exists, up to isomorphism, a unique ranked X-cactus (N =((V, A),ϕ),r) with σ(r)= σ and S = S (N ) i i for all i ∈{0, 1, 2,... ,σ}. The time-stamp function t on the vertices of N is then defined by putting 0if r(v)=0 t(v)= r(v)−1 ω(S)if r(v) > 0 i=0 for all v ∈ V . Note that every ω ∈ M ◦ with ω = ω and supp(ω )= (P (X),F( )) supp(ω) yields the same ranked X-cactus (N ,r) but a time-stamp function t = t on the vertices of N . Also note that every equidistant X-cactus (N ,t)arises from some ω ∈ M ◦ as described above. (P (X),F( )) To illustrate the proof of Theorem 3, consider the equidistant X-cactus (N ,t)on X = {a, b, c, d, e} in Fig. 3, which arises from the point ω ∈ K. T. Huber et al. M ◦ with supp(ω)= {S , S , S , S }, where 0 1 2 3 (P (X),F( )) S = {({a}, {b}), ({b}, ∅), ({c, d, e}, {b})} S = {({a}, {b}), ({b}, ∅), ({c}, {b}), ({d, e}, ∅)} S = {({a}, {b}), ({b}, ∅), ({c}, {b}), ({d}, ∅), ({e}, ∅)} S = {({a}, ∅), ({b}, ∅), ({c}, ∅), ({d}, ∅), ({e}, ∅)}, and ω(S )=0.8, ω(S )=0.4, ω(S )=1.2, ω(S )=0.6. 0 1 2 3 In general, as equidistant-cactus space is high-dimensional, for |X|≥ 4 its structure is not easy to visualize. However, to get some insights it can be useful to consider the so-called link of the origin ⎧ ⎫ ⎨ ⎬ ◦ ◦ L = ω ∈ M : ω(S)=1 , (P (X),F( )) (P (X),F( )) ⎩ ⎭ S∈P (X) a geometric realization of the abstract simplicial complex (P (X), F ()). Since (P (X), F ()) is a flag complex, the structure of L ◦ is com- (P (X),F( )) pletely determined by the graph with vertex set P (X) in which two distinct vertices are connected by an edge if and only if they are comparable by .In Fig. 8, we present the link of the origin of N(X)for |X| = 4. Note that, for this case, we have |P (X)| = 44 and that there are 14 vertices that correspond to rooted X-trees. The shaded vertices in Fig. 8 together with the oval vertex induce a subgraph that is isomorphic to the graph corresponding to the link of the origin of τ -space (i.e. M ◦ ), which is isomorphic to a subdivision (B (X),F()) of the Petersen graph (see also [18, Fig. 3]). We conclude this section with a corollary of Theorem 3 that describes a relationship between τ -space and equidistant-cactus space. Corollary 3. The orthants of M ◦ are in one-to-one correspondence (B (X),F()) with the orthants O(A) of M ◦ for those A ∈F () with H(S)= ∅ (P (X),F( )) for all S∈ A. Proof. By definition, the orthants of M ◦ are in one-to-one corre- (B (X),F()) spondence with chains in (B (X), ). By Corollary 1 and the definition of F (), such chains are in one-to-one correspondence with chains C in (P (X), ) for which H(S)= ∅ for all S∈ C. Again by definition, the latter chains are in one-to-one correspondence with the orthants O(A)of M ◦ for those A ∈F () with H(S)= ∅ for all S∈ A. (P (X),F( )) We remark that the characterization of geodesic paths in CAT(0)-orthant spaces in [25, Corollary 6.19] holds for equidistant-cactus space N(X). This im- plies that, for any two points in N(X) that correspond to equidistant X-trees, all points on the unique geodesic path between these two points also corre- spond to equidistant X-trees. In other words, (M ◦ ,D ◦ ) (B (X),F()) (P (X),F( )) is a convex subspace of N(X)=(M ◦ ,D ◦ ). (P (X),F( )) (P (X),F( acd b ab cd cd a b ... . .. ab cd cb d a bc d a bdc a acd b ad c b ca d b bcd a ab c d ab c d ab d c acd b cab d a bc d bad c abdc bacd bc a d ad bc ad b c abd c abc d ad b c ac b d bd a c bc a d bd a c ac bd ac b d ab cd cd b a cd a b ba dc abd c bac d ac d b ad c b bc d a bd c a ad b c ac b d Figure 8. The graph that determines the structure of the link of the origin L ◦ for X = {a, b, c, d}.The (P (X),F( )) oval vertex is adjacent to all other vertices. The ranked X- cactus displayed for each vertex corresponds to the chain {S, {(X, ∅)}} for each S∈ P (X) 9. Conclusion We have introduced the space N(X) of equidistant X-cactuses. By deriving an encoding for ranked X-cactuses, we obtained N(X) as an orthant space and proved that it is a CAT(0)-metric space. Thus, we can compute the distance in N(X) between any two equidistant X-cactuses and the unique geodesic path between them in polynomial time [25], compute approximations of the Fr´ echet mean and variance as well as of the median of a set of equidistant X-cactuses K. T. Huber et al. [4, 25], and a central limit theorem holds [5]. There are several directions for future research and open questions including: • It would be interesting to count the number ν of isomorphism classes of binary ranked X-cactuses with rankings of size |X|− 1. In view of Theorem 2, this is equivalent to counting the number of maximal chains in the graded poset (P(X), ). Counting chains in certain types of posets is a well-studied problem (see e.g. [33]). The values of ν for n =1, 2, 3, 4 are1,1,6,72. • It is known that the link of the origin of phylogenetic tree space as de- fined in [8] has the homotopy type of the wedge of spheres. It would be interesting to work out the homotopy type of the link of the origin of N(X), and also what other properties it might enjoy (for example, is it Cohen-Macaulay as with the tree-space defined in [8]?) • As was pointed out in [15], there is a connection between the space of circular split collections defined in [16] and a certain type of unrooted phy- logenetic networks called level-1 networks. Since these unrooted level-1 networks can be regarded as unrooted X-cactuses, it would be interesting to investigate if there are some connections between N(X) and the space of circular split collections. • It would be interesting to define and understand the geometry of spaces of more complicated phylogenetic networks with arc lengths. Two obvi- ous candidates for such an investigation are rooted level-2 networks and tree-child, time consistent networks (see [34, Chapter 10] for definitions). Moreover, one could try to relax the requirement that the phylogenetic networks are equidistant. • How does the distance between equidistant X-cactuses in N(X) compare to other distance measures between phylogenetic networks? For example, it was shown in [1] that the weighted Robinson–Foulds distance between phylogenetic trees [28]isa 2-approximation of the distance between phylogenetic trees in the tree space defined in [8]. Acknowledgements KTH and VM thank the Department of Mathematics at City College of New York (City University of New York) and the American Museum of Natural History for their hospitality. MO is partially supported by the US National Science Foundation (DMS 1847271). This work was supported by a grant from the Simons Foundation (#355824, Megan Owen). KAS thanks the Simons Foundation (#316124) and the US National Science Foundation (#1461094) for research and travel support. We thank the anonymous referee and handling editor for their helpful comments. Data Availability This manuscript has no associated data. The Space of Equidistant Phylogenetic Cactuses Declarations Conflict of Interest On behalf of all authors, the corresponding author states that there is no conflict of interest. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and re- production in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regu- lation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. Publisher’s Note Springer Nature remains neutral with regard to jurisdic- tional claims in published maps and institutional affiliations. References [1] N. Amenta, M. Godwin, N. Postarnakevich, and K. St. John. Approximating geodesic tree distance. Information Processing Letters, 103(2):61–65, 2007. [2] F. Ardila and C. Klivans. The Bergman complex of a matroid and phylogenetic trees. Journal of Combinatorial Theory, Series B, 96(1):38–49, 2006. [3] F. Ardila-Mantilla. CAT(0) geometry, robots, and society. Notices of the AMS, 67:977–987, 2020. [4] M. Bac´ ak. Computing medians and means in Hadamard spaces. SIAM Journal on Optimization, 24(3):1542–1566, 2014. [5] D. Barden and H. Le. The logarithm map, its limits and Fr´echet means in orthant spaces. Proceedings of the London Mathematical Society, 117(4):751–789, 2018. [6] M. Baroni, C. Semple, and M. Steel. Hybrids in real time. Systematic Biology, 55(1):46–56, 2006. [7] F. Bienvenu, A. Lambert, and M. Steel. Combinatorial and stochastic properties of ranked tree-child networks. Random Structures & Algorithms, 60(4):653–689, [8] L. Billera, S. Holmes, and K. Vogtmann. Geometry of the space of phylogenetic trees. Advances in Applied Mathematics, 27(4):733–767, 2001. [9] M. Bordewich, S. Linz, and C. Semple. Lost in space? Generalising subtree prune and regraft to spaces of phylogenetic networks. Journal of Theoretical Biology, 423:1–12, 2017. K. T. Huber et al. [10] M. Bordewich and N. Tokac. An algorithm for reconstructing ultrametric tree- child networks from inter-taxa distances. Discrete Applied Mathematics, 213:47– 59, 2016. [11] M. Bridson and A. Haefliger. Metric spaces of non-positive curvature. Springer, [12] A. Caraceni, M. Fuchs, and G.-R. Yu. Bijections for ranked tree-child networks. Discrete Mathematics, 345(9), 2022. [13] H.-L. Chan, J. Jansson, T.-W. Lam, and S.-M. Yiu. Reconstructing an ultra- metric galled phylogenetic network from a distance matrix. Journal of Bioinfor- matics and Computational Biology, 4(4):807–832, 2006. [14] V. Chepoi. Graphs of some CAT(0) complexes. Advances in Applied Mathemat- ics, 24(2):125–179, 2000. [15] S. Devadoss, C. Durell, and S. Forcey. Split network polytopes and network spaces. In Proc. of the 31st Conference on Formal Power Series and Algebraic Combinatorics, 2019. [16] S. Devadoss and S. Petti. A space of phylogenetic networks. SIAM Journal on Applied Algebra and Geometry, 1(1):683–705, 2017. [17] P. Gambette, L. van Iersel, M. Jones, M. Lafond, F. Pardi, and C. Scornavacca. Rearrangement moves on rooted phylogenetic networks. PLoS Computational Biology, 13(8), 2017. [18] A. Gavryushkin and A. Drummond. The space of ultrametric phylogenetic trees. Journal of Theoretical Biology, 403:197–208, 2016. [19] M. Gromov. Hyperbolic groups. In Essays in group theory, pages 75–263. Springer, 1987. [20] M. Hayamizu,K.T.Huber,V.Moulton,and Y. Murakami.Recognizingand realizing cactus metrics. Information Processing Letters, 157, 2020. [21] M. Hellmuth, D. Schaller, and P. Stadler. Compatibility of partitions with trees, hierarchies, and split systems. Discrete Applied Mathematics, 314:265–283, 2022. [22] K. T. Huber, V. Moulton, and A. Spillner. Phylogenetic consensus networks: computing a consensus of 1-nested phylogenetic networks. arXiv preprint 2107.09696, 2021. [23] K. T. Huber, V. Moulton, and T. Wu. Transforming phylogenetic networks: Moving beyond tree space. Journal of Theoretical Biology, 404:30–39, 2016. [24] R. Janssen, M. Jones, P. Erd˝ os, L. van Iersel, and C. Scornavacca. Exploring the tiers of rooted phylogenetic network space using tail moves. Bulletin of Mathe- matical Biology, 80(8):2177–2208, 2018. [25] E. Miller, M. Owen, and J. S. Provan. Polyhedral computational geometry for averaging metric phylogenetic trees. Advances in Applied Mathematics, 68:51– 91, 2015. The Space of Equidistant Phylogenetic Cactuses [26] T. Nye. An algorithm for constructing principal geodesics in phylogenetic treespace. IEEE/ACM Transactions on Computational Biology and Bioinfor- matics, 11(2):304–315, 2014. [27] T. Nye, X. Tang, G. Weyenberg, and R. Yoshida. Principal component analysis and the locus of the Fr´ echet mean in the space of phylogenetic trees. Biometrika, 104(4):901–922, 2017. [28] D. Robinson and L. Foulds. Comparison of weighted labelled trees. In Combi- natorial mathematics VI, pages 119–126. Springer, 1979. [29] F. Rossell´ o and G. Valiente. All that glisters is not galled. Mathematical Bio- sciences, 221(1):54–59, 2009. [30] C. Semple and M. Steel. Phylogenetics. Oxford University Press, 2003. [31] N. Sloane. The on-line encyclopedia of integer sequences. https://oeis.org, 2021. Sequence A190865, accessed July 2021. [32] K. St. John. The shape of phylogenetic treespace. Systematic Biology, 66(1):e83– e94, 2017. [33] R. Stanley. A survey of Eulerian posets. In Polytopes: Abstract, convex and computational, pages 301–333. Springer, 1994. [34] M. Steel. Phylogeny: Discrete and random processes in evolution. SIAM, 2016. [35] W. Trotter. Partially ordered sets. In R. Graham, editor, Handbook of Combi- natorics, volume 1, pages 433–480. Elsevier, 1995. [36] N. White, editor. Matroid applications. Cambridge University Press, 1992. [37] A. Willis. Confidence sets for phylogenetic trees. Journal of the American Sta- tistical Association, 114(525):235–244, 2019. Katharina T. Huber and Vincent Moulton School of Computing Sciences University of East Anglia Norwich, NR4 7TJ UK e-mail: v.moulton@uea.ac.uk Katharina T. Huber e-mail: k.huber@uea.ac.uk Megan Owen Department of Mathematics Lehman College, CUNY New York, NY10468 USA e-mail: megan.owen@lehman.cuny.edu K. T. Huber et al. Andreas Spillner Merseburg University of Applied Sciences 06217 Merseburg Germany e-mail: andreas.spillner@hs-merseburg.de Katherine St. John Department of Computer Science Hunter College, CUNY New York, NY 10065 USA e-mail: stjohn@hunter.cuny.edu Received: 10 January 2022. Accepted: 24 May 2023.
Annals of Combinatorics – Springer Journals
Published: Mar 1, 2024
Keywords: Phylogenetic network; Network space; Combinatorial encoding; CAT(0)-metric space; 05C90; 06A06; 52B70; 92D15
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.