Species identification and species-specific markers
This study focused on characterizing the variability and stability of
proteomic fingerprints using widely distributed, epipelagic copepod
species from various coastal zooplankton communities around the North
Atlantic as model case.
In line with previous studies on marine copepods (Riccardi et al., 2012,
Laakmann et al., 2013, Bode et al., 2017, Kaiser et al., 2018, Rossel &
Martinez Arbizu, 2018a, Rossel et al., 2019, Renz et al., 2021, Yeom et
al., 2021), our results clearly support the high discriminatory power of
proteomic fingerprinting on species level in this taxonomic group. All
specimens were correctly assigned to the different 27 species by a RF
model for classification with only low sensitivity to data processing
indicating high robustness of the method. Many of the included species
e.g., Acartia spp. and Calanus spp. can either only be
separated with time consuming morphological analyses, such as
preparation of the fifth swimming leg, or by genetic analysis, such as
the cryptic species of Pseudocalanus . Their reliable
identification by proteomic fingerprinting again highlights the high
value that MALDI-TOF MS could contribute to routine monitoring of marine
communities, especially since most of the species investigated here are
dominant at many North Atlantic monitoring sites.
In depth analysis of proteomic spectra revealed that no discrete
species-markers exist, and that identification is rather based on
complex pattern differences than on the presence or absence of single
compounds. Although several peaks show high specificity for a species,
their sensitivity remains too low to serve as a single marker. The
uniqueness of compounds in a species becomes increasingly blurred as
more species and more specimens from different regions and seasons are
included in the analysis. This has strong implications for the
applicability of the method in multi-species research. Since the mere
presence of individual species-specific peaks cannot serve as an
indicator for the presence of a species in a bulk sample, multiplexing
of proteomic fingerprints similar to metabarcoding appears to remain
unlikely. Therefore, future application of this method in zooplankton
monitoring will likely focus more on quantitative approaches as part of
a bug-by-bug strategy and thereby improve species resolution and
identification of challenging taxa. Given that the time- and
cost-effectiveness of MALDI-TOF is much better than, for example, DNA
barcoding (Rossel et al., 2019, Renz et al., 2021), it is nevertheless a
powerful tool to accelerate biodiversity assessment and facilitate early
and timely detection of changes in communities and ecosystems.
It is remarkable that apparently no conservative homologous compound was
detected in all specimens, and only three peaks were expressed in at
least some specimens from all species. This variability in peak
abundance may be subject to genotypic or natural physiology-related
variability, or probably at least partly to methodological or sample
quality-related variability. One essential step in data processing is a
slight shift of m/z values to align potential homologous peaks during
binning and to account for observed small mass deviations caused e.g. by
the relatively short trajectory of the Biotyper. As peak alignment is
not independent of peak-neighbors, universal compounds may therefore be
hidden in the fuzziness of the method. However, as we detected more than
300 peaks with 100% intra-specific frequency, and as more than 70% of
peaks were detected in more than 10 of the 27 species, we argue that
most homologous peaks were correctly identified in most cases and assume
that this methodological effect is unlikely the main reason for absence
of universal peaks. No or only one or two peaks with 100%
intra-specific frequency were found in those species with highest
diversity in terms of included regions and environmental surroundings.
This suggests that the observed variance in patterns is likely driven
more by different genotypes or by phenotypic expression driven by
physiological state.
Despite the absence of discrete peaks, species were reliably identified
by RF based on several discriminant peaks. These were found in all mass
ranges and intensities. In general peaks with higher m/z values were of
lower intensity. Consistent with the findings on intra-specific peak
frequencies, the importance of species-specific peaks in the model
decreased with increasing sample size. The disconnection of peak
intensity and peak importance for species identification has previously
been observed in other taxa, e.g. insects (Dieme et al., 2014; Müller et
al., 2013) and crustaceans (Paulus et al., 2022).