Figure captions
Figure 1. Schematic
representation of our workflow indicating the main steps of the
analytical pipeline. Our four main phases are represented in different
colors, i.e. transcriptomic steps for the selection of open reading
frames (ORFs, blue), comparative genomics for the selection of
ultraconserved elements (UCEs, dark purple), probe design and the
sequencing of genomic libraries (light purple), analyses after the
processing of sequencing data (red).
Figure 2. Percentual recovery of the total length of 1,114 ORFs
for 96 sequenced taxa and an outgroup transcriptome. The total size of
ORFs, on average ~1500 nt, and the number of million
filtered reads per sample are also indicated. The red line as to number
of reads indicates a cut-off value of 500,000 reads. Samples with a
lower number of reads displayed a drastic decrease in ORF recovery. A
drastic decrease in recovery was also observed for the outgroup sample
dna0240 belonging to the family Iridinidae, despite having generated
3,900,000 clean reads.
Figure 3. UCE recovery depending on combinations of coverage
and identity as specified in the PHYLUCE pipeline. Boxplots indicate the
contig ratio for all individuals, i.e. the number of unique
contigs/maximum number of unique contigs per individual for all 96
individuals. The total number of UCEs recovered for 96 individuals
varied between 1905 (combination cov50 and id50-60) and 926 (combination
cov80 and id80). Several scenarios where coverage and identity are
between 50 and 65% resulted in similar results, whereas both the number
of unique contigs and UCE recovery decrease substantially for scenarios
with identity >70%.
Figure 4. Gene map of the maternally-inherited mitochondrial
genome for Coelaturini from the Malawi Basin. Genes positioned on the 5’
to 3’ (positive/heavy) strand are indicated on the inner circle, whereas
those on the 3’ to 5’ (negative/light) strand on the outer circle.
Figure 5. Maximum likelihood phylogeny of Coelaturini based on
a concatenated dataset of 1,109 open reading frames (2,348,614 bp with
515,219 parsimony informative sites; left), including exons and their
intronic/intergenic flanking regions, and based on a concatenated
dataset of 276 ultraconserved elements (119,105 bp with 11,001 parsimony
informative sites; right). Red nodes are fully resolved using a
Shimodaira-Hasegawa approximate likelihood ratio test and ultrafast
bootstrapping. Both datasets result in highly congruent topologies,
which are fully or mainly supported for ORFs and UCEs, respectively.
More details are provided in Figs. S3 and S4.
Figure 6.A) Square root transformed
nucleotide diversity (π) within six populations of Coelaturini from the
Malawi Basin as inferred from ORFs (without intronic/intergenic flanking
regions). The nucleotide diversity averaged over all six populations is
indicated with a dashed red line, whereas mean population pairwise
sequence divergence for each of 15 population pairs, i.e. mean, square
root transformed DXY values, are indicated with
blue lines (they are highly similar for all population pairs, resulting
in a bold blue line). B) The density distribution of population pairwiseFST values for 1104 ORFs (one value per ORF) for
two out of the 15 population pairs (these distributions are
representative for other population pairs too).
Figure 7. Structure of molecular diversity in Coelaturini from
the Malawi Basin. A,B) Principal component analysis on genome-wide SNP
data, with 95% convex hulls on sampling localities. A bathymetric map
of Lake Malawi, its outflow and the studied populations is provided in
the inset. C) Bayesian clustering with fastSTRUCTURE on the same SNP
dataset returned most support for a four-cluster solution separating the
northern and southern regions of the Malawi Basin, and additionally the
populations of Likoma Island (MLW8_032) and the Shire River
(MLW8_010).