Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

MEGA11: Molecular Evolutionary Genetics Analysis Version 11

MEGA11: Molecular Evolutionary Genetics Analysis Version 11 The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor,and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more re- sponsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net. Key words: software, phylogenetics, timetrees, tip dating, neutrality. testing for selection, and diagnosing disease mutations Introduction (Caspermeyer 2018). The Molecular Evolutionary Genetics Analysis (MEGA) soft- With every new version, MEGA has evolved to harness ware has continuously grown to meet the need for sophisti- technological innovations and personal desktops’ computa- cated evolutionary analysis to discover organismal and tional power. MEGA’s interface evolved from its initial MS- genome evolutionary patterns and processes. It was first re- DOS character-based format (Kumar et al. 1993)to a rich leased in 1993 to offer the statistical methods of molecular graphical user interface (GUI) for Microsoft Windows oper- evolution through an interactive interface on the Microsoft ating system (Kumar et al. 2001). It was then redesigned to Disk Operating System (MS-DOS) (Kumar et al. 1993). For become activity-driven (Tamura et al. 2011), followed by the more than 25 years, MEGA’s scope and usefulness have grown incorporation of web technologies to ensure a consistent use- through the addition of new methods, tools, and interfaces, and-feel across Microsoft Windows and Linux operating sys- resulting in modern integrated software for comparative se- tems (Kumar et al. 2018)and macOS(Stecher et al. 2020). quence analysis (Caspermeyer 2018). Initially, MEGA con- MEGA GUI is now fully cross-platform running natively on tained distance-based and maximum parsimony methods Windows, Linux, and macOS. for molecular phylogenetic analysis (Kumar et al. 1994). The MEGA’s computational core (MEGA-CC) has undergone data acquisition and integration of major approaches for extensive refactoring, hardening, and expansion over time. It aligning sequences were introduced to expand MEGA’s scope advanced from 16-bit to 32-bit (Kumar et al. 2001), became (Kumar et al. 2004). Afterward, the maximum likelihood (ML) multithreaded and incorporated multicore parallelization for methods and Bayesian methods were added for molecular various calculations (Tamura et al. 2013), andsteppedupto evolutionary analyses (Tamura et al. 2011). MEGA now con- 64-bit architecture (Kumar et al. 2016, 2018). MEGA-CC was tains methods for selecting the best-fit substitution model(s), released for use as a command-line program to address the estimating evolutionary distances and divergence times, growing need for batch processing of many data sets and reconstructing phylogenies, predicting ancestral sequences, integration into analysis workflows (Kumar et al. 2012; The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is Open Access properly cited. 3022 Mol. Biol. Evol. 38(7):3022–3027 doi:10.1093/molbev/msab120 Advance Access publication April 23, 2021 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE Stecher et al. 2020). With both 32- and 64-bit versions of A calibration density selector has been added to the Node MEGA currently available for use on the command-line and Calibration Editor that provides an option to select normal, GUI, MEGA is now a suite of applications that responds to the lognormal, uniform, or exponential density (fig. 1). The user variety of computing environments currently used by can also specify a minimum or a maximum time bound on a researchers in molecular evolution and phylogenetics. Here, node. The calibration text file format has been extended to we present key methodological additions and technical specify density information and use calibration densities in improvements in MEGA that comprise version 11. MEGA-CC. The Node Calibration Editor also includes new functionality to specify a fixed evolutionary rate or a known Methodological Additions node time to calibrate the molecular clock. Such assumptions are often used by investigators when independent calibration Expansion of Relaxed-Clock Dating Facilities information is unknown (Hipsley and Mu ¨ller 2014; Tao, Rapid relaxed-clock methods for estimating divergence times Tamura, Kumar, et al. 2020). are becoming popular because they are feasible and efficient for large contemporary sequence alignments (Tao, Tamura, Tip-Dating for Sequences with Sampling Times Kumar, et al. 2020). MEGA6 first added methods and tools for MEGA now implements a method to estimate timetrees us- constructing evolutionary timetrees by implementing the ing sampling dates for molecular sequences. They are often RelTime method, which does not assume a molecular clock used to infer the origin and diversification of pathogens that (Tamura et al. 2012, 2013). RelTime is known to perform well generally evolve fast enough to track the evolutionary change and has been used to build timetrees in hundreds of research over months and years (Tao, Tamura, Kumar, et al. 2020). Tip- articles (Tao, Tamura, Kumar, et al. 2020). MEGA11 expands dating methods are also useful for analyzing ancient molec- on RelTime dating options by advancing the current imple- ular sequences. MEGA implements a rapid tip-dating mentation and adding new facilities for node-dating and tip- method, RelTime with Dated Tips (RTDT), that produces dating needed to build timetrees of pathogens, species, and divergence timesand CIs(Miura et al. 2018). One may use gene families. ML or distance-based approaches for a given phylogeny and multiple sequence alignment for tip-dating, or a phylogeny Calibrating the Clock Using Probability Densities on Node- with branch lengths and tip dates can be given as the input. Constraints An enhanced Timetree Wizard system (fig. 2)walks theuser Bayesian relaxed-clock methods have long allowed the use of through many steps needed to configure tip-dating analyses, statistical probability distributions that capture prior knowl- such as loading sequence and tree files, specifying the out- edge (or belief) about the true divergence times in clock cal- groups, adding sequence sample times, and selecting the anal- ibration constraints on one or more nodes in the phylogeny. ysis options. Sequence sampling times can be specified in Judicious use of these probability densities can make diver- multiple ways. MEGA will automatically extract them on- gence times more accurate and precise (Tao, Tamura, Mello, demand when they are included in the sequence name. et al. 2020). Researchers can now use such probability densi- Spatiotemporal information can also be presented in the input ties for node calibrations in RelTime estimation of divergence alignment files as meta tags (see description below) or loaded times and confidence intervals (CIs). MEGA implements the using specially formatted calibration text files. Once computed, Tao, Tamura, Mello, et al. (2020) approach that estimates CIs thetimetreeisdisplayed in the Tree Explorer that has been by simultaneously accounting for variance introduced by the extensively revamped and updated (fig. 3). It now has many heterogeneity of evolutionary rate among lineages, estimation more formatting tools, including exporting the timetree, indi- of sequence divergence using substitution models, and prob- vidual divergence times, and CI estimates in a tabular format. ability densities for node-calibration constraints. This method produces CIs that contain correct times with a high proba- Detecting Autocorrelation of Evolutionary Rates bility, making them much more suitable for biological hy- MEGA now contains a facility for detecting autocorrelation of pothesis testing than other rapid methods (Tao, Tamura, evolutionary rates among branches, which is important for Kumar, et al. 2020; Tao, Tamura, Mello, et al. 2020). understanding molecular evolution patterns and useful as a For RelTime analyses in MEGA11, ML and distance-based clock rate prior in Bayesian relaxed-clock analyses. MEGA approaches canbeusedtobuild atimetreefor agiven phy- implements the CorrTest method developed using machine logeny and multiple sequence alignment. One may also use learning, which is accurate and computationally efficient (Tao only a phylogeny with branch lengths, which extends the et al. 2019). The CorrTest implementation in MEGA requires usefulness of relaxed-clock methods for phylogenies inferred a phylogeny with sequence alignment (or branch lengths) from nonmolecular data or statistical methodologies not and is accessed through an easy-to-use wizard. This test’s final available in MEGA. When a phylogeny with branch lengths output is a CorrScore between 0 and 1 and a P-value, where a is used, the CIs will be narrower because the variance associ- high CorrScore and low P-value indicates that branch rates ated with branch length estimation cannot be generated among lineages are likely correlated. without the original data set used to produce the phylogeny and branch lengths. Nevertheless, these CIs will incorporate Calculating Neutral Evolutionary Probabilities variance introduced due to rate variation among lineages and According to the neutral theory of molecular evolution, most clock calibrations’ uncertainty. differences in molecular sequences across species are expected 3023 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 Tamura et al. doi:10.1093/molbev/msab120 MBE FIG.1. Calibration points for MEGA’S RelTime method are chosen in the Node Calibration Editor window (A), accessed via the Timetree Wizard system (see fig. 2A). The Node Calibration Editor displays the phylogeny where individual node calibrations and probability densities can be chosen by clicking the calibration button on the top toolbar for the selected node. A dropdown menu (B) with several calibration density types is displayed. The Node Calibration Editor then prompts the user for required distribution parameters, depending on the distribution selected: normal distribution (mean and standard deviation), lognormal (offset, mean and standard deviation), exponential (offset and decay parameter), uniform (min and max) (C). to have little to no impact on fitness (Kimura 1983). Therefore, Technological Advances multispecies sequence alignments have been used to estimate Although some new user interface elements have already neutral evolutionary probabilities (EP) of observing alternative been mentioned above (figs. 1–3), additional technical advan- alleles (amino acid residues or nucleotides) in a species, con- ces in MEGA11 are as follows. tingent on the given species timetree (Liu et al. 2016). MEGA implements an advanced option for this Bayesian approach in which the species timetree containing relative times is com- Expanded Group Designations puted automatically by using RelTime (Patel and Kumar MEGA has long supported a “group” tag for sequences and 2019). Alleles with EP less than 0.05 are nonneutral, whereas other operational taxonomic units (OTUs). Using the sequence evolutionary permissible (neutral) alleles show much higher “group” tags, MEGA offered a group-wise exploration of input EPs. Disease-associated amino acid variants in human popu- data, selection of data subsets, and computational analyses lations have EP < 0.05 and are rarely found in the population (Kumar 2001). Support for two new tags (“population” and (Liu et al. 2016). Many human adaptive variants in populations “species”) was added in MEGA7, with the species tags used to also have low EPs, that is, nonneutral from an evolutionary mark duplicate genes in multigene family phylogenies (Kumar perspective, but they show high allele frequencies (Patel et al. et al. 2016). In MEGA11, sequences can now be tagged to 2018). Therefore, one may use EPs to diagnose disease muta- provide information on the continent, country, city, year, tions and detect candidate adaptive variants. An EP wizard month, day, and time. This spatiotemporal information can system walks the user through the steps required to set up the be used in tip-dating analyses. analysis. The first sequence in the alignment is used automat- In MEGA11, we have made a MEGA-wide change to use ically as the focal taxon of interest (one can rearrange sequen- any meta tag to define groups. For example, if one selects the ces in the Sequence Data Explorer). EP values for all possible “Year” meta tag for use as a group, they could estimate av- bases (4 for nucleotides and 20 for amino acids) at each po- erage diversity within and between sequences sampled in sition in the input sequence alignment are reported in a different years (Distance menu). In the Sequence Data spreadsheet or text format. Explorer, one can select/unselect sequences of certain years 3024 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE FIG.2. The Tip Dating Wizard (A) guides the user through the steps required to set up the RTDT analysis. Once a sequence alignment and/or a tree is provided, the user is prompted to specify the outgroup by selecting a node in the Tree Explorer or specifying outgroup taxa by name (not shown). Next, sample times are specified using the Tip Dates Editor (B) with facilities for parsing tip dates (C) encoded in taxa names, importing tip dates from a text file, and manually entering the dates. In the next step, the Analysis Preferences dialog (not shown) is displayed, allowing the user to set analysis options to estimate branch lengths used by RTDT. The estimated timetree is displayed in the Tree Explorer (see fig. 3). for phylogenetic analyses. Also, the display of years would be configurations have the same likelihood value. The total automatically enabled in the Tree Explorer, and the feature to log-likelihood is simply the sum of site-configuration log-like- collapse sequence clusters will be done by years. Additionally, lihoods weighted by their frequencies. However, this upgrade sequences can be sorted based on years in all the input data required refactoring many different parts of MEGA’s calcula- and result explorer displays. Therefore, a dynamic designation tion engine, including functions for phylogeny construction of groups based on the desired meta tag will enable data and model selection. exploration and analysis more efficiently. Enhanced GUI for Exploring Large Data Sets Memory Efficient ML Analyses Using a large multiple sequence alignment containing 68,000 ML methods are widely used for phylogenetic inference but genomes and 30,000 bases each, we assessed MEGA GUI’s place high demands on computer memory, becoming in- responsiveness during input data file reading, execution of creasingly burdensome for bigger sequence alignments functions in the Sequence Data Explorer,estimation ofpair- analyzed these days. In MEGA11, we have now completed wise distances, and building of distance-based phylogenies. a long-overdue refactoring of ML calculations by adding a We found the GUI to become intermittently unresponsive step to identify common site configurations, that is, sites for such large data sets, which are now common due where all sequences have the same bases as at some other to resequencing and population sequencing efforts. sites, to utilize computer memory more efficiently. The mem- Consequently, we have moved all potentially long-running ory requirements of Maximum Likelihood and Maximum operations out of the main GUI thread to background parsimony analysis are reduced (approximately) by the factor threads in a major overhaul of the source code. Now, large of m/L when there are m distinct site configurations in a input data files are read rapidly, and calculations of pairwise sequence alignment containing L sites. The memory saving distance matrices, selection tests, and phylogeny construction can be substantial for multigene and genome-scale align- for distance-based methods are performed in a background ments. For example, the memory saving was 660 MB (209 thread. The Sequence Data Explorer has been reprogrammed vs. 870 MB) for a sequence alignment of 229 birds with 2,728 to enable more efficient highlighting of variable sites, and sites (Claramunt and Cracraft 2015)and 4.5 GB(2.3vs. navigation of the sequence alignment has been improved. 6.8 GB) for an alignment of 162 mammals with 11,010 Also added are options to automatically label sites based sites (Meredith et al. 2011). This memory saving does not on attributes, which annotates sites by providing a one- have any detrimental impact on phylogenetic estimates character label and then using desired labeled sites to subset and computational times because identical site data for any molecular phylogenetic analysis desired. 3025 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 Tamura et al. doi:10.1093/molbev/msab120 MBE FIG.3. MEGA’s Tree Explorer (A) is a feature-rich, versatile viewer of phylogenies that provides many interactive exploration and customization facilities. In MEGA11, the new side toolbar of Tree Explorer makes formatting, rearrangement, and tree exploration tools more accessible and intuitive. Instead of a thin toolbar with nameless buttons, we have opted for a wide toolbar with text labels identifying each tool. The toolbar can be moved to either side of the window, and it can be toggled in and out of view. To organize related tools by groups and accommodate limited vertical space, collapsible panels are used. With the new toolbar, formatting tools previously displayed in external dialogs are readily accessible, and formats are applied instantly instead of after the user closes the external dialog. In addition to the updated toolbar, there are now options for auto- collapsing of nodes containing clusters of taxa belonging to the same group, user-specified cluster size, or by the branch length difference. For very large trees with many similar sequences, this feature can greatly facilitate the visualization of evolutionary events at a glance. An option has been added to export pairwise patristic distances between taxa to a text file for phylogenies and timetrees. For maximum likelihood and maximum parsimony trees where ancestral sequences are present, an option has been added to navigate through sites where a change in the estimated ancestral state differs between the parent and child on the currently selected branch. The tree information box (B) has been updated for timetrees to show branch- and node-specific information, such as earliest and latest sample times in the currently selected subtree, days elapsed between the divergence time for a selected node and the latest sample time, the nearest and furthest tip from a selected node, clade size and clade taxa, and spatiotemporal information if available. was supported in part by research grants from the National Conclusions Institutes of Health (R35GM139504-01), National Science Version 11 of MEGA adds many methods and tools to keep Foundation (DEB-2034228, DBI-1661218), and Japan Society pace with researchers’ growing needs. The addition of evolu- for the Promotion of Science (JSPS) grants-in-aid for scientific tionary dating methods in MEGA make it easier to estimate research (DB5) to K.T. species and strain divergence times by using more informative node calibrations and sampling times. The new CorrTest and Data Availability EP calculations will enable a more robust evaluation of Thesoftwareand itssourcecodeare availablefromwww. assumptions about biological characteristics of molecular megasoftware.net. data. The reduction in memory needs of ML-based compu- tations will allow users to analyze much larger data sets than before. The refactoring of distance-based methods’ calcula- References tion to run in threads independent of the main graphical Caspermeyer J. 2018. MEGA software celebrates silver anniversary. Mol interface and other GUI enhancements greatly improve Biol Evol. 35(6):1558–1560. Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history’s MEGA usability for very large data sets. imprint on the evolution of modern birds. Sci Adv. 1(11):e1501005. Hipsley CA, Mu ¨ller J. 2014. Beyond fossil calibrations: realities of molec- Acknowledgments ular clock practices in evolutionary biology. Front Genet.5:138. We thank our laboratory members and many beta testers for Kimura M. 1983. The neutral theory of molecular evolution. New York: providing invaluable feedback and bug reports. This study Cambridge University Press. 3026 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: molecular Miura S, Tamura K, Tao Q, Huuki LA, Pond SLK, Priest J, Deng J, Kumar S. Evolutionary Genetics Analysis across computing platforms. Mol Biol 2018. A new method for inferring timetrees from temporally sam- Evol. 35(6):1547–1549. pled molecular sequences. PLoS Comput Biol. 16:24. Kumar S, Stecher G, Peterson D, Tamura K. 2012. MEGA-CC: com- Patel R, Kumar S. 2019. On estimating evolutionary probabilities of pop- puting core of molecular evolutionary genetics analysis program ulation variants. BMC Evol Biol. 19(1):133 (14 pp.). for automated and iterative data analysis. Bioinformatics Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A, 28(20):2685–2686. Glicksberg BS, Xu K, Dudley JT, Kumar S. 2018. Adaptive landscape of Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular Evolutionary protein variation in human exomes. Mol Biol Evol. 35(8):2015–2025. Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. Stecher G, Tamura K, Kumar S. 2020. Molecular Evolutionary Genetics 33(7):1870–1874. Analysis (MEGA) for macOS. MolBiolEvol. 37(4):1237–1239. Kumar S, Tamura K, Jakobsen I, Nei M. 2001. MEGA2: molecular Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S. evolutionary genetics analysis software. Bioinformatics 2012. Estimating divergence times in large molecular phylogenies. 17(12):1244–1245. Proc Natl Acad Sci USA. 109(47):19333–19338. Kumar S, Tamura K, Nei M. 1993. MEGA: Molecular Evolutionary Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. Genetics Analysis version 1.01. University Park (PA): The MEGA5: molecular evolutionary genetic analysis using maximum Pennsylvania State University. likelihood, evolutionary distance, and maximum parsimony meth- Kumar S, Tamura K, Nei M. 1994. MEGA—molecular evolutionary ge- ods. Mol Biol Evol. 28(10):2731–2739. netics analysis software for microcomputers. Comput Appl Biosci. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: 10(2):189–191. molecular evolutionary genetics analysis version 6.0. MolBiolEvol. Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for 30(12):2725–2729. Molecular Evolutionary Genetics Analysis and sequence alignment. Tao Q, Tamura K, Battistuzzi F, Kumar S. 2019. A machine learning Brief Bioinform. 5(2):150–163. method for detecting autocorrelation of evolutionary rates in large Liu L, Tamura K, Sanderford MD, Gray V, Kumar S. 2016. A molecular phylogenies. Mol Biol Evol. 36(4):811–824. evolutionary reference or the human variome. MolBiolEvol. Tao Q, Tamura K, Kumar S. 2020. Efficient methods for dating evolu- 33(1):245–254. tionary divergences. In: Ho SYW, editor. The molecular evolutionary Meredith RW, Janecka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, clock. Switzerland: Springer Nature. p. 197–219. Goodbla A, Eizirik E, Simao TLL, Stadler T, et al. 2011. Impacts of the Tao Q, Tamura K, Mello B, Kumar S. 2020. Reliable confidence intervals cretaceous terrestrial revolution and KPg extinction on mammal for RelTime estimates of evolutionary divergence times. Mol Biol diversification. Science 334(6055):521–524. Evol. 37(1):280–290. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Biology and Evolution Oxford University Press

MEGA11: Molecular Evolutionary Genetics Analysis Version 11

Loading next page...
 
/lp/oxford-university-press/mega11-molecular-evolutionary-genetics-analysis-version-11-WUIqIOYhFa

References (37)

Publisher
Oxford University Press
Copyright
Copyright © 2022 Society for Molecular Biology and Evolution
ISSN
0737-4038
eISSN
1537-1719
DOI
10.1093/molbev/msab120
Publisher site
See Article on Publisher Site

Abstract

The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor,and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more re- sponsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net. Key words: software, phylogenetics, timetrees, tip dating, neutrality. testing for selection, and diagnosing disease mutations Introduction (Caspermeyer 2018). The Molecular Evolutionary Genetics Analysis (MEGA) soft- With every new version, MEGA has evolved to harness ware has continuously grown to meet the need for sophisti- technological innovations and personal desktops’ computa- cated evolutionary analysis to discover organismal and tional power. MEGA’s interface evolved from its initial MS- genome evolutionary patterns and processes. It was first re- DOS character-based format (Kumar et al. 1993)to a rich leased in 1993 to offer the statistical methods of molecular graphical user interface (GUI) for Microsoft Windows oper- evolution through an interactive interface on the Microsoft ating system (Kumar et al. 2001). It was then redesigned to Disk Operating System (MS-DOS) (Kumar et al. 1993). For become activity-driven (Tamura et al. 2011), followed by the more than 25 years, MEGA’s scope and usefulness have grown incorporation of web technologies to ensure a consistent use- through the addition of new methods, tools, and interfaces, and-feel across Microsoft Windows and Linux operating sys- resulting in modern integrated software for comparative se- tems (Kumar et al. 2018)and macOS(Stecher et al. 2020). quence analysis (Caspermeyer 2018). Initially, MEGA con- MEGA GUI is now fully cross-platform running natively on tained distance-based and maximum parsimony methods Windows, Linux, and macOS. for molecular phylogenetic analysis (Kumar et al. 1994). The MEGA’s computational core (MEGA-CC) has undergone data acquisition and integration of major approaches for extensive refactoring, hardening, and expansion over time. It aligning sequences were introduced to expand MEGA’s scope advanced from 16-bit to 32-bit (Kumar et al. 2001), became (Kumar et al. 2004). Afterward, the maximum likelihood (ML) multithreaded and incorporated multicore parallelization for methods and Bayesian methods were added for molecular various calculations (Tamura et al. 2013), andsteppedupto evolutionary analyses (Tamura et al. 2011). MEGA now con- 64-bit architecture (Kumar et al. 2016, 2018). MEGA-CC was tains methods for selecting the best-fit substitution model(s), released for use as a command-line program to address the estimating evolutionary distances and divergence times, growing need for batch processing of many data sets and reconstructing phylogenies, predicting ancestral sequences, integration into analysis workflows (Kumar et al. 2012; The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is Open Access properly cited. 3022 Mol. Biol. Evol. 38(7):3022–3027 doi:10.1093/molbev/msab120 Advance Access publication April 23, 2021 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE Stecher et al. 2020). With both 32- and 64-bit versions of A calibration density selector has been added to the Node MEGA currently available for use on the command-line and Calibration Editor that provides an option to select normal, GUI, MEGA is now a suite of applications that responds to the lognormal, uniform, or exponential density (fig. 1). The user variety of computing environments currently used by can also specify a minimum or a maximum time bound on a researchers in molecular evolution and phylogenetics. Here, node. The calibration text file format has been extended to we present key methodological additions and technical specify density information and use calibration densities in improvements in MEGA that comprise version 11. MEGA-CC. The Node Calibration Editor also includes new functionality to specify a fixed evolutionary rate or a known Methodological Additions node time to calibrate the molecular clock. Such assumptions are often used by investigators when independent calibration Expansion of Relaxed-Clock Dating Facilities information is unknown (Hipsley and Mu ¨ller 2014; Tao, Rapid relaxed-clock methods for estimating divergence times Tamura, Kumar, et al. 2020). are becoming popular because they are feasible and efficient for large contemporary sequence alignments (Tao, Tamura, Tip-Dating for Sequences with Sampling Times Kumar, et al. 2020). MEGA6 first added methods and tools for MEGA now implements a method to estimate timetrees us- constructing evolutionary timetrees by implementing the ing sampling dates for molecular sequences. They are often RelTime method, which does not assume a molecular clock used to infer the origin and diversification of pathogens that (Tamura et al. 2012, 2013). RelTime is known to perform well generally evolve fast enough to track the evolutionary change and has been used to build timetrees in hundreds of research over months and years (Tao, Tamura, Kumar, et al. 2020). Tip- articles (Tao, Tamura, Kumar, et al. 2020). MEGA11 expands dating methods are also useful for analyzing ancient molec- on RelTime dating options by advancing the current imple- ular sequences. MEGA implements a rapid tip-dating mentation and adding new facilities for node-dating and tip- method, RelTime with Dated Tips (RTDT), that produces dating needed to build timetrees of pathogens, species, and divergence timesand CIs(Miura et al. 2018). One may use gene families. ML or distance-based approaches for a given phylogeny and multiple sequence alignment for tip-dating, or a phylogeny Calibrating the Clock Using Probability Densities on Node- with branch lengths and tip dates can be given as the input. Constraints An enhanced Timetree Wizard system (fig. 2)walks theuser Bayesian relaxed-clock methods have long allowed the use of through many steps needed to configure tip-dating analyses, statistical probability distributions that capture prior knowl- such as loading sequence and tree files, specifying the out- edge (or belief) about the true divergence times in clock cal- groups, adding sequence sample times, and selecting the anal- ibration constraints on one or more nodes in the phylogeny. ysis options. Sequence sampling times can be specified in Judicious use of these probability densities can make diver- multiple ways. MEGA will automatically extract them on- gence times more accurate and precise (Tao, Tamura, Mello, demand when they are included in the sequence name. et al. 2020). Researchers can now use such probability densi- Spatiotemporal information can also be presented in the input ties for node calibrations in RelTime estimation of divergence alignment files as meta tags (see description below) or loaded times and confidence intervals (CIs). MEGA implements the using specially formatted calibration text files. Once computed, Tao, Tamura, Mello, et al. (2020) approach that estimates CIs thetimetreeisdisplayed in the Tree Explorer that has been by simultaneously accounting for variance introduced by the extensively revamped and updated (fig. 3). It now has many heterogeneity of evolutionary rate among lineages, estimation more formatting tools, including exporting the timetree, indi- of sequence divergence using substitution models, and prob- vidual divergence times, and CI estimates in a tabular format. ability densities for node-calibration constraints. This method produces CIs that contain correct times with a high proba- Detecting Autocorrelation of Evolutionary Rates bility, making them much more suitable for biological hy- MEGA now contains a facility for detecting autocorrelation of pothesis testing than other rapid methods (Tao, Tamura, evolutionary rates among branches, which is important for Kumar, et al. 2020; Tao, Tamura, Mello, et al. 2020). understanding molecular evolution patterns and useful as a For RelTime analyses in MEGA11, ML and distance-based clock rate prior in Bayesian relaxed-clock analyses. MEGA approaches canbeusedtobuild atimetreefor agiven phy- implements the CorrTest method developed using machine logeny and multiple sequence alignment. One may also use learning, which is accurate and computationally efficient (Tao only a phylogeny with branch lengths, which extends the et al. 2019). The CorrTest implementation in MEGA requires usefulness of relaxed-clock methods for phylogenies inferred a phylogeny with sequence alignment (or branch lengths) from nonmolecular data or statistical methodologies not and is accessed through an easy-to-use wizard. This test’s final available in MEGA. When a phylogeny with branch lengths output is a CorrScore between 0 and 1 and a P-value, where a is used, the CIs will be narrower because the variance associ- high CorrScore and low P-value indicates that branch rates ated with branch length estimation cannot be generated among lineages are likely correlated. without the original data set used to produce the phylogeny and branch lengths. Nevertheless, these CIs will incorporate Calculating Neutral Evolutionary Probabilities variance introduced due to rate variation among lineages and According to the neutral theory of molecular evolution, most clock calibrations’ uncertainty. differences in molecular sequences across species are expected 3023 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 Tamura et al. doi:10.1093/molbev/msab120 MBE FIG.1. Calibration points for MEGA’S RelTime method are chosen in the Node Calibration Editor window (A), accessed via the Timetree Wizard system (see fig. 2A). The Node Calibration Editor displays the phylogeny where individual node calibrations and probability densities can be chosen by clicking the calibration button on the top toolbar for the selected node. A dropdown menu (B) with several calibration density types is displayed. The Node Calibration Editor then prompts the user for required distribution parameters, depending on the distribution selected: normal distribution (mean and standard deviation), lognormal (offset, mean and standard deviation), exponential (offset and decay parameter), uniform (min and max) (C). to have little to no impact on fitness (Kimura 1983). Therefore, Technological Advances multispecies sequence alignments have been used to estimate Although some new user interface elements have already neutral evolutionary probabilities (EP) of observing alternative been mentioned above (figs. 1–3), additional technical advan- alleles (amino acid residues or nucleotides) in a species, con- ces in MEGA11 are as follows. tingent on the given species timetree (Liu et al. 2016). MEGA implements an advanced option for this Bayesian approach in which the species timetree containing relative times is com- Expanded Group Designations puted automatically by using RelTime (Patel and Kumar MEGA has long supported a “group” tag for sequences and 2019). Alleles with EP less than 0.05 are nonneutral, whereas other operational taxonomic units (OTUs). Using the sequence evolutionary permissible (neutral) alleles show much higher “group” tags, MEGA offered a group-wise exploration of input EPs. Disease-associated amino acid variants in human popu- data, selection of data subsets, and computational analyses lations have EP < 0.05 and are rarely found in the population (Kumar 2001). Support for two new tags (“population” and (Liu et al. 2016). Many human adaptive variants in populations “species”) was added in MEGA7, with the species tags used to also have low EPs, that is, nonneutral from an evolutionary mark duplicate genes in multigene family phylogenies (Kumar perspective, but they show high allele frequencies (Patel et al. et al. 2016). In MEGA11, sequences can now be tagged to 2018). Therefore, one may use EPs to diagnose disease muta- provide information on the continent, country, city, year, tions and detect candidate adaptive variants. An EP wizard month, day, and time. This spatiotemporal information can system walks the user through the steps required to set up the be used in tip-dating analyses. analysis. The first sequence in the alignment is used automat- In MEGA11, we have made a MEGA-wide change to use ically as the focal taxon of interest (one can rearrange sequen- any meta tag to define groups. For example, if one selects the ces in the Sequence Data Explorer). EP values for all possible “Year” meta tag for use as a group, they could estimate av- bases (4 for nucleotides and 20 for amino acids) at each po- erage diversity within and between sequences sampled in sition in the input sequence alignment are reported in a different years (Distance menu). In the Sequence Data spreadsheet or text format. Explorer, one can select/unselect sequences of certain years 3024 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE FIG.2. The Tip Dating Wizard (A) guides the user through the steps required to set up the RTDT analysis. Once a sequence alignment and/or a tree is provided, the user is prompted to specify the outgroup by selecting a node in the Tree Explorer or specifying outgroup taxa by name (not shown). Next, sample times are specified using the Tip Dates Editor (B) with facilities for parsing tip dates (C) encoded in taxa names, importing tip dates from a text file, and manually entering the dates. In the next step, the Analysis Preferences dialog (not shown) is displayed, allowing the user to set analysis options to estimate branch lengths used by RTDT. The estimated timetree is displayed in the Tree Explorer (see fig. 3). for phylogenetic analyses. Also, the display of years would be configurations have the same likelihood value. The total automatically enabled in the Tree Explorer, and the feature to log-likelihood is simply the sum of site-configuration log-like- collapse sequence clusters will be done by years. Additionally, lihoods weighted by their frequencies. However, this upgrade sequences can be sorted based on years in all the input data required refactoring many different parts of MEGA’s calcula- and result explorer displays. Therefore, a dynamic designation tion engine, including functions for phylogeny construction of groups based on the desired meta tag will enable data and model selection. exploration and analysis more efficiently. Enhanced GUI for Exploring Large Data Sets Memory Efficient ML Analyses Using a large multiple sequence alignment containing 68,000 ML methods are widely used for phylogenetic inference but genomes and 30,000 bases each, we assessed MEGA GUI’s place high demands on computer memory, becoming in- responsiveness during input data file reading, execution of creasingly burdensome for bigger sequence alignments functions in the Sequence Data Explorer,estimation ofpair- analyzed these days. In MEGA11, we have now completed wise distances, and building of distance-based phylogenies. a long-overdue refactoring of ML calculations by adding a We found the GUI to become intermittently unresponsive step to identify common site configurations, that is, sites for such large data sets, which are now common due where all sequences have the same bases as at some other to resequencing and population sequencing efforts. sites, to utilize computer memory more efficiently. The mem- Consequently, we have moved all potentially long-running ory requirements of Maximum Likelihood and Maximum operations out of the main GUI thread to background parsimony analysis are reduced (approximately) by the factor threads in a major overhaul of the source code. Now, large of m/L when there are m distinct site configurations in a input data files are read rapidly, and calculations of pairwise sequence alignment containing L sites. The memory saving distance matrices, selection tests, and phylogeny construction can be substantial for multigene and genome-scale align- for distance-based methods are performed in a background ments. For example, the memory saving was 660 MB (209 thread. The Sequence Data Explorer has been reprogrammed vs. 870 MB) for a sequence alignment of 229 birds with 2,728 to enable more efficient highlighting of variable sites, and sites (Claramunt and Cracraft 2015)and 4.5 GB(2.3vs. navigation of the sequence alignment has been improved. 6.8 GB) for an alignment of 162 mammals with 11,010 Also added are options to automatically label sites based sites (Meredith et al. 2011). This memory saving does not on attributes, which annotates sites by providing a one- have any detrimental impact on phylogenetic estimates character label and then using desired labeled sites to subset and computational times because identical site data for any molecular phylogenetic analysis desired. 3025 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 Tamura et al. doi:10.1093/molbev/msab120 MBE FIG.3. MEGA’s Tree Explorer (A) is a feature-rich, versatile viewer of phylogenies that provides many interactive exploration and customization facilities. In MEGA11, the new side toolbar of Tree Explorer makes formatting, rearrangement, and tree exploration tools more accessible and intuitive. Instead of a thin toolbar with nameless buttons, we have opted for a wide toolbar with text labels identifying each tool. The toolbar can be moved to either side of the window, and it can be toggled in and out of view. To organize related tools by groups and accommodate limited vertical space, collapsible panels are used. With the new toolbar, formatting tools previously displayed in external dialogs are readily accessible, and formats are applied instantly instead of after the user closes the external dialog. In addition to the updated toolbar, there are now options for auto- collapsing of nodes containing clusters of taxa belonging to the same group, user-specified cluster size, or by the branch length difference. For very large trees with many similar sequences, this feature can greatly facilitate the visualization of evolutionary events at a glance. An option has been added to export pairwise patristic distances between taxa to a text file for phylogenies and timetrees. For maximum likelihood and maximum parsimony trees where ancestral sequences are present, an option has been added to navigate through sites where a change in the estimated ancestral state differs between the parent and child on the currently selected branch. The tree information box (B) has been updated for timetrees to show branch- and node-specific information, such as earliest and latest sample times in the currently selected subtree, days elapsed between the divergence time for a selected node and the latest sample time, the nearest and furthest tip from a selected node, clade size and clade taxa, and spatiotemporal information if available. was supported in part by research grants from the National Conclusions Institutes of Health (R35GM139504-01), National Science Version 11 of MEGA adds many methods and tools to keep Foundation (DEB-2034228, DBI-1661218), and Japan Society pace with researchers’ growing needs. The addition of evolu- for the Promotion of Science (JSPS) grants-in-aid for scientific tionary dating methods in MEGA make it easier to estimate research (DB5) to K.T. species and strain divergence times by using more informative node calibrations and sampling times. The new CorrTest and Data Availability EP calculations will enable a more robust evaluation of Thesoftwareand itssourcecodeare availablefromwww. assumptions about biological characteristics of molecular megasoftware.net. data. The reduction in memory needs of ML-based compu- tations will allow users to analyze much larger data sets than before. The refactoring of distance-based methods’ calcula- References tion to run in threads independent of the main graphical Caspermeyer J. 2018. MEGA software celebrates silver anniversary. Mol interface and other GUI enhancements greatly improve Biol Evol. 35(6):1558–1560. Claramunt S, Cracraft J. 2015. A new time tree reveals Earth history’s MEGA usability for very large data sets. imprint on the evolution of modern birds. Sci Adv. 1(11):e1501005. Hipsley CA, Mu ¨ller J. 2014. Beyond fossil calibrations: realities of molec- Acknowledgments ular clock practices in evolutionary biology. Front Genet.5:138. We thank our laboratory members and many beta testers for Kimura M. 1983. The neutral theory of molecular evolution. New York: providing invaluable feedback and bug reports. This study Cambridge University Press. 3026 Downloaded from https://academic.oup.com/mbe/article/38/7/3022/6248099 by DeepDyve user on 23 June 2022 MEGA11 doi:10.1093/molbev/msab120 MBE Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: molecular Miura S, Tamura K, Tao Q, Huuki LA, Pond SLK, Priest J, Deng J, Kumar S. Evolutionary Genetics Analysis across computing platforms. Mol Biol 2018. A new method for inferring timetrees from temporally sam- Evol. 35(6):1547–1549. pled molecular sequences. PLoS Comput Biol. 16:24. Kumar S, Stecher G, Peterson D, Tamura K. 2012. MEGA-CC: com- Patel R, Kumar S. 2019. On estimating evolutionary probabilities of pop- puting core of molecular evolutionary genetics analysis program ulation variants. BMC Evol Biol. 19(1):133 (14 pp.). for automated and iterative data analysis. Bioinformatics Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A, 28(20):2685–2686. Glicksberg BS, Xu K, Dudley JT, Kumar S. 2018. Adaptive landscape of Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular Evolutionary protein variation in human exomes. Mol Biol Evol. 35(8):2015–2025. Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. Stecher G, Tamura K, Kumar S. 2020. Molecular Evolutionary Genetics 33(7):1870–1874. Analysis (MEGA) for macOS. MolBiolEvol. 37(4):1237–1239. Kumar S, Tamura K, Jakobsen I, Nei M. 2001. MEGA2: molecular Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S. evolutionary genetics analysis software. Bioinformatics 2012. Estimating divergence times in large molecular phylogenies. 17(12):1244–1245. Proc Natl Acad Sci USA. 109(47):19333–19338. Kumar S, Tamura K, Nei M. 1993. MEGA: Molecular Evolutionary Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. Genetics Analysis version 1.01. University Park (PA): The MEGA5: molecular evolutionary genetic analysis using maximum Pennsylvania State University. likelihood, evolutionary distance, and maximum parsimony meth- Kumar S, Tamura K, Nei M. 1994. MEGA—molecular evolutionary ge- ods. Mol Biol Evol. 28(10):2731–2739. netics analysis software for microcomputers. Comput Appl Biosci. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: 10(2):189–191. molecular evolutionary genetics analysis version 6.0. MolBiolEvol. Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for 30(12):2725–2729. Molecular Evolutionary Genetics Analysis and sequence alignment. Tao Q, Tamura K, Battistuzzi F, Kumar S. 2019. A machine learning Brief Bioinform. 5(2):150–163. method for detecting autocorrelation of evolutionary rates in large Liu L, Tamura K, Sanderford MD, Gray V, Kumar S. 2016. A molecular phylogenies. Mol Biol Evol. 36(4):811–824. evolutionary reference or the human variome. MolBiolEvol. Tao Q, Tamura K, Kumar S. 2020. Efficient methods for dating evolu- 33(1):245–254. tionary divergences. In: Ho SYW, editor. The molecular evolutionary Meredith RW, Janecka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, clock. Switzerland: Springer Nature. p. 197–219. Goodbla A, Eizirik E, Simao TLL, Stadler T, et al. 2011. Impacts of the Tao Q, Tamura K, Mello B, Kumar S. 2020. Reliable confidence intervals cretaceous terrestrial revolution and KPg extinction on mammal for RelTime estimates of evolutionary divergence times. Mol Biol diversification. Science 334(6055):521–524. Evol. 37(1):280–290.

Journal

Molecular Biology and EvolutionOxford University Press

Published: Jun 25, 2021

There are no references for this article.