An efficient method for exploring the space of gene tree/species tree reconciliations in a
probabilistic framework

Jean-Philippe Doyon, Sylvie Hamel, and Cedric Chauve
Accepted to IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011

Background. Inferring an evolutionary scenario for a gene family is a fundamental problem with applications both in functional and evolutionary genomics. The gene tree/species tree reconciliation approach has been widely used to address this problem, but mostly in a discrete parsimony framework that aims at minimizing the number of gene duplications and/or gene losses. Recently, a probabilistic approach has been developed, based on the classical birth-and-death process, including efficient algo- rithms for computing posterior probabilities of reconciliations and orthology prediction.
Results. In previous work, we described an algorithm for exploring the whole space of gene tree/species tree reconciliations, that we adapt here to compute efficiently the posterior probability of such reconciliations. These posterior probabilities can be either computed exactly or approximated, depending on the reconciliation space size. We use this algorithm to analyze the probabilistic landscape of the space of reconciliations for a real dataset of fungal gene families and several datasets of synthetic gene trees.
Conclusion. The results of our simulations suggest that, with exact gene trees obtained by a simple birth-and-death process and realistic gene duplication/loss rates, a very small subset of all reconciliations needs to be explored in order to approximate very closely the posterior probability of the most likely reconciliations. For cases where the posterior probability mass is more evenly dispersed, our method allows to explore efficiently the required subspace of reconciliations.





Korak program

Given a gene tree G, a species tree S (both in newick format) and duplication/loss rates and branch length for S,  Korak computes the following:

A user manual of the program: manualExploration.pdf.

The archive Exploration.tgz contains the following:

Follow these steps to build the binary called Korak
  1. Download the archive Exploration.tgz.
  2. Create a new directory and move the archive in it.
  3. Extract the archive : 'tar -zxvf Exploration.tgz'
  4. 'cd EXPLORATION/exploration/'
  5. Type 'cmake .' to build the makefile corresponding to your system.
  6. Type 'make' to build the binary file called Exploration (several warnings are written to the shell, don't worry, it is normal: job to do latter)
  7. Execute the binary: './Exploration D ../INPUT_FILES/DATA2 E L Q D'
The optionn of the program and the format of the output are described in manualExploration.pdf. The same example as Step 7 below is used.

Output files of the
Exploration program:

Probabilistic Analysis on Real and Simulated Gene Trees

This section contains the following:


Input gene trees

Increasing Factor (I.F.)Gene TreesBranch Lengths (in time) and RatesSpecies Tree (12 fungal genomes)
Real gene treesNot applicable1278 trees
realGeneTree.tgz
edgeValues-1




Simulated gene trees11051 trees
simulatedGeneTree_1.tgz
1.41025 trees
simulatedGeneTree_1.4.tgz
edgeValues-1.4
1.8  924 trees
simulatedGeneTree_1.8.tgz
edgeValues-1.8


Probabilistic analysis
Reconciliation Tree ExploredReal Gene TreesSimulated Gene Tree with I.F.
11.41.8
Whole treerealGeneTree_CompleteExploration.tgz
simulatedGeneTree_CompleteExploration_1.tgz
simulatedGeneTree_CompleteExploration_1.4.tgz
simulatedGeneTree_CompleteExploration_1.8.tgz
Subtree with Depth



0realGeneTree_Depth_0.tgz



1realGeneTree_Depth_1.tgz



2realGeneTree_Depth_2.tgz


3realGeneTree_Depth_3.tgz


4realGeneTree_Depth_4.tgz


5realGeneTree_Depth_5.tgz


6realGeneTree_Depth_6.tgz


7realGeneTree_Depth_7.tgz


8realGeneTree_Depth_8.tgz


9
realGeneTree_Depth_9.tgz


10
realGeneTree_Depth_10.tgz




References
[1] I. Wapinski, A. Pfeffer, N. Friedman, and A. Regev. Natural history and evolutionary principles of gene duplication in fungi. Nature, 449:54–61, 2007.
[2] T. De Bie, N. Cristianini, J.P. Demuth, and M.W. Hahn. CAFE: a computational tool for the study of gene family evolution. Bioinformatics, 22(10):1269–1271, 2006.