Reconstructing Phylogenetic Trees from Whole Genomes and Transcriptomes
Principal Investigator:
Alexandros Stamatakis
Affiliation:
Heidelberg Institute for Theoretical Studies
Local Project ID:
pr58te
HPC Platform used:
SuperMUC of LRZ
Date published:
Leveraging the computing capacities of HPC system SuperMUC, computer scientists conducted large-scale evolutionary analysis projects of birds and insects. Input datasets comprising 50-100 transcriptomes (the entirety of all RNA molecules in a genome) or genomes that represent the species under study requires supercomputers. Just computing the plausibility of a single out of trillions and trillions of possible evolutionary scenarios requires several terabytes of main memory, and billions of arithmetic operations are required.
The main challenge in the field evolutionary biology is data accumulation. Over the last years, obtaining genome sequence data for species whose evolution researchers intend to study has become orders of magnitude cheaper. Now, the cost for generating these data is dropping at a faster pace than the cost for analyzing them on a computer.
Therefore, computer scientists involved in large-scale evolutionary analysis projects had to develop more efficient software for analyzing these data. Handling the size of such input datasets that currently comprise 50-100 transcriptomes (the entirety of all RNA molecules in a genome) or genomes that represent the species under study requires supercomputers. Just computing the plausibility of a single out of trillions and trillions of possible evolutionary scenarios (evolutionary trees) requires several terabytes of main memory, and billions of arithmetic operations are required.
To find reasonable evolutionary scenarios for the evolution of birds (using 48 genomes representing 48 species) and insects (using 144 transcriptomes representing 144 species) on earth, the multi-national research teams used the SuperMUC supercomputer of the LRZ in Garching near Munich.
Left: Diestrammena asynamora; right: Hedychrum nobile.
Copyright: Dr. Oliver Niehuis, ZFMK, BonnThe researchers were able to show that insects originated at the same time as the earliest land plants about 480 million years ago. The results therefore suggest that insects and plants jointly shaped the earliest terrestrial ecosystems.
In the bird study, the project team was able to date the evolutionary expansion of Neoaves to the mass extinction event 66 million years ago that killed all dinosaurs except some birds.
The insights gained have advanced the basic understanding of how life on earth evolved. In particular, the work on insect evolution will be essential to understanding the millions of insect species that shape the terrestrial environment. Insects support and threaten natural resources at the same time. Insects are thus of outstanding ecological, economic, and medical importance and affect life on earth, from pollinating crops to transferring diseases.
Another key contribution of this project lies in the adaptation of the software for reconstructing evolutionary trees to the SuperMUC system. This was coupled with substantial further improvements of the algorithms by the team of computer scientists involved in the project. In addition, the software called ExaML is available free of charge to the entire scientific community, such that researchers around the globe can now conduct such computational analyses. Note that, ExaML can be used for reconstructing evolutionary histories for any type of organism, ranging from bacteria and viruses over fungi and plants to mammals.
(Click to enlarge image) Dated phylogenetic tree of insect relationships.-- The tree was inferred through a maximum-likelihood analysis of 413,459 amino acid sites divided into 479 metapartitions. Branch lengths were optimized and node ages estimated from 1,050,000 trees sampled from trees separately generated for 105 partitions that included all taxa (5). All nodes up to orders are labeled with numbers (gray circles). Colored circles indicate bootstrap support (5) (left key). The time line at the bottom of the tree relates the geological origin of insect clades to major geological and biological events. CONDYLO, Condylognatha; PAL, Palaeoptera.
Copyright: Science, 7 November 2014: Vol. 346 no. 6210 pp. 763-767 DOI: 10.1126/science.1257570Web-Links:
Bird Paper in SCIENCE
Insect Paper in SCIENCE
Research group that developed ExaML and executed the data analyses
ExaML download
H-ITS outreach video
Scientific Contact:
Prof. Dr. Alexandros Stamatakis
Scientific Computing Group (SCO), Heidelberg Institute for Theoretical Studies
HITS gGmbH, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg (Germany)
Email: alexandros.stamatakis@h-its.org