LIFE SCIENCES

Reconstructing Phylogenetic Trees from Whole Genomes and Transcriptomes

Principal Investigator:
Alexandros Stamatakis

Affiliation:
Heidelberg Institute for Theoretical Studies

Local Project ID:
pr58te

HPC Platform used:
SuperMUC of LRZ

Date published:

Leveraging the computing capacities of HPC system SuperMUC, computer scientists conducted large-scale evolutionary analysis projects of birds and insects. Input datasets comprising 50-100 transcriptomes (the entirety of all RNA molecules in a genome) or genomes that represent the species under study requires supercomputers. Just computing the plausibility of a single out of trillions and trillions of possible evolutionary scenarios requires several terabytes of main memory, and billions of arithmetic operations are required. 

The main challenge in the field evolutionary biology is data accumulation. Over the last years, obtaining genome sequence data for species whose evolution researchers intend to study has become orders of magnitude cheaper. Now, the cost for generating these data is dropping at a faster pace than the cost for analyzing them on a computer.

Therefore, computer scientists involved in large-scale evolutionary analysis projects had to develop more efficient software for analyzing these data. Handling the size of such input datasets that currently comprise 50-100 transcriptomes (the entirety of all RNA molecules in a genome) or genomes that represent the species under study requires supercomputers. Just computing the plausibility of a single out of trillions and trillions of possible evolutionary scenarios (evolutionary trees) requires several terabytes of main memory, and billions of arithmetic operations are required.

To find reasonable evolutionary scenarios for the evolution of birds (using 48 genomes representing 48 species) and insects (using 144 transcriptomes representing 144 species) on earth, the multi-national research teams used the SuperMUC supercomputer of the LRZ in Garching near Munich.

The researchers were able to show that insects originated at the same time as the earliest land plants about 480 million years ago. The results therefore suggest that insects and plants jointly shaped the earliest terrestrial ecosystems.

In the bird study, the project team was able to date the evolutionary expansion of Neoaves to the mass extinction event 66 million years ago that killed all dinosaurs except some birds.

The insights gained have advanced the basic understanding of how life on earth evolved. In particular, the work on insect evolution will be essential to understanding the millions of insect species that shape the terrestrial environment. Insects support and threaten natural resources at the same time. Insects are thus of outstanding ecological, economic, and medical importance and affect life on earth, from pollinating crops to transferring diseases.

Another key contribution of this project lies in the adaptation of the software for reconstructing evolutionary trees to the SuperMUC system. This was coupled with substantial further improvements of the algorithms by the team of computer scientists involved in the project. In addition, the software called ExaML is available free of charge to the entire scientific community, such that researchers around the globe can now conduct such computational analyses. Note that, ExaML can be used for reconstructing evolutionary histories for any type of organism, ranging from bacteria and viruses over fungi and plants to mammals.

Web-Links:
Bird Paper in SCIENCE 
Insect Paper in SCIENCE 
Research group that developed ExaML and executed the data analyses
ExaML download 
H-ITS outreach video

Scientific Contact:
Prof. Dr. Alexandros Stamatakis
Scientific Computing Group (SCO), Heidelberg Institute for Theoretical Studies
HITS gGmbH, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg (Germany)
Email: alexandros.stamatakis@h-its.org

Tags: Life Science LRZ HITS