DNA extraction and metabarcoding
DNA was extracted using a CTAB-phenol-chloroform protocol as described
in Godhe et al. (2001), with an added RNA digestion step during cell
lysis (65ºC for 60 min using one mg RNaseA [mL CTAB
buffer]-1). Genomic DNA yield was quantified using
Qubit (Thermo Fisher Scientific). The evolution experiment used a single
barcoding locus to track strain selection. This 523 bp locus,Sm_C12W1 , is located on contig 12 inside a pentatricopeptide
(PPR) repeat region of gene Sm_t00009768-RA, encoding an RNA-binding
protein. The locus has 38 SNP positions amongst the 58 strains used in
this study, and 110 unique alleles with 100% heterozygosity, including
two triploid/aneuploid strains (Pinder et al., 2023). Sm_C12W1was amplified from 100 ng DNA from the evolution experiments using the
Phusion® High-Fidelity PCR Kit (Thermo Scientific). The primer
concentration used was 0.1 µM each of Sm_C12W1-F
(5’AGGYTTCGCCTCCTCAAAC3’) and Sm_C12W1-R (5’GGCACGATGCACACGCAAAG3’). A
One-step PCR reaction with Illumina-adapter extended primers was run for
30 cycles, with five s denaturation (98ºC), five s annealing (62.5ºC),
30 s extension (72ºC), and a final five min extension. The PCR products
were quantified using PicoGreen™ dsDNA kit (Invitrogen™) and normalized
to 100 ng per sample. Magnet bead-based size selection
(>300 bp), Nextera Dual-indexing, pooling, and further
library preparation, as well as amplicon sequencing (1/2 of an Illumina
MiSeq lane with 300 bp pair-end read with v3 chemistry), was carried out
by the National Genomics Infrastructure (Uppsala, Sweden).
Amplicon sequence data were quality controlled and processed as
described in (Pinder et al., 2023). Briefly, adapter sequences were
removed from reads using Cutadapt version 3.2 (Martin, 2011), trimmed
using a PHRED score >28 and a minimum read length of 180
bp, and merged into amplicons using BBmerge version 38.86 (Bushnell et
al., 2017), with default settings. Although DADA2’s denoising algorithms
(Callahan et al., 2016) could correct a substantial part of amplicons
with sequencing errors, it also removed the majority of low frequency
allelic observations, potentially generating false negatives (Pinder et
al., 2023). We therefore opted to not use DADA2. Instead, we chose to
rely on exact sequence matches to known alleles, and utilized strains
heterozygosity to quality control and filter data. First, alleles shared
between two to three strains were divided according to the abundance of
each strain’s second unique allele using a set of differential equations
(Table S1). In general, the proportion of shared allelei 1 belonging to a specific strain x, y, and z was
computed individually for each sample based on the strain’s second
unique allele (x2, y2, andz2 ):
Eq. 2:\(\text{Reads\ of\ }i1\ \text{belonging\ to\ strain\ }x\ =\ i1\ \times\ \frac{\text{strain\ }x\text{\ allele\ }x2\ \ }{(strain\ x\text{\ allele\ }x2\ +\ strain\ y\text{\ allele\ }y2\ +\ strain\ z\text{\ allele\ }z2)}\)
The two heterozygous allele counts were then summed up to represent the
abundance of each strain. During this step any strain without
observations of both alleles was disregarded as a false positive and
counted as absent (through the first part of Eq. 3 below). The third
allele of triploid strains was removed. Amplicon counts per strain were
normalized to relative abundance (RA) as:
Eq. 3:\(\text{RA\ strain\ }x=\ \frac{\text{\ counts\ allele\ }x1\ \ \times\ counts\ \text{allele}\text{\ x}2}{\text{counts\ allele\ }x1\ \ \times\ counts\ \text{allele}\text{\ x}2}\ \times\frac{\text{\ counts\ allele\ }x1\ \ +\ counts\ \text{allele}\text{\ x}2}{\sum\text{counts\ allels\ }_{a-z}}\)
The changes in the relative abundance of each strain’s alleles, combined
with the population’s overall growth rate over the first nine days
(µ9days) of the evolution experiment, were used to
compute each strain’s specific growth rate during co-cultivation in the
evolution experiment.
Eq. 4:\(u\text{\ strain\ }x=u_{9days}+\frac{\text{LN}\left(\text{RA\ strain\ }x\ day\ 9\right)-\ LN\left(\text{RA\ strain\ }x\ day\ 0\right)}{9}\)
Strains not seen in any of the five experimental replicates at day nine
were presumed to be extinct since dilution bottelnecks in population
size were generally smaller than the sequensing depth. The false
discovery rate (FDR) of this approch was quantified by taking advantage
of the fact that the two populations were processed independently. FDR
was computed by comparing the total number of strain observations (TO)
to the False Positive (FP) observations of mining strain seen in a
reference sample, or vice versa, accounting for the fact that only half
of all FP can be detected this way:
Eq. 5: \(\text{FDR}=\frac{(FP\ \times\ 2)\ }{\text{TO}}\)
All data, scripts and metadata will be made publicaly available opun
acceptance (Andersson, Berglund, et al., 2023), or can be accessed at:
https://github.com/Bearstar85/Cu_evolution.