Three Genome-scale Approaches Support that Lungfish is the Closest
Living Relative of Land Vertebrate, but not Coelacanth
Abstract
The origin of tetrapod has been one of intense debating open questions
for decades between coelacanth(Latimeria chalumnae) and lungfish
(Protopterus annectens). For resolving this incongruence in phylogenies,
a genome-wide data mining approach is used to retrieve 43 shared genes
of seven taxa from GenBank and further 1001 orthologous genes of ten
taxa from the Ensembl and NCBI. We used the maximum gene-support tree
approach and the majority-rule branch approach to analyze 43 nuclear
genes encoding amino acid residues and compared these results to those
inferred with the concatenation approach. Our results successfully
provide strong evidence in favor of the lungfish-tetrapod hypothesis,
but rejecting the coelacanth-tetrapod hypothesis based on significantly
fewer gene supports and lower taxon jackknife probabilities for the
coelacanth-tetrapod clade than the lungfish-tetrapod one with the
maximum gene-support tree approach and the jackknife method for taxon
subsampling. When more and more genomic data become available in recent
years, sequence data of 1001 shared genes was mined. We used the maximum
gene-support approach with this larger dataset successfully to infer
that lungfish is the closest relative of land vertebrates with a
significant difference at p < 0.01 (Chi-Square test) in gene
support values between a maximum gene-support tree and the second most
gene support tree with ML methods. The second most support to the
maximum (SM ratio), a relative value, is a better support index than a
single absolute value of support to show the insight of the phylogenetic
support. Our results also show increasing the number of shared genes is
much more effective than increasing the number of taxa.