Species identification and species-specific markers
This study focused on characterizing the variability and stability of proteomic fingerprints using widely distributed, epipelagic copepod species from various coastal zooplankton communities around the North Atlantic as model case.
In line with previous studies on marine copepods (Riccardi et al., 2012, Laakmann et al., 2013, Bode et al., 2017, Kaiser et al., 2018, Rossel & Martinez Arbizu, 2018a, Rossel et al., 2019, Renz et al., 2021, Yeom et al., 2021), our results clearly support the high discriminatory power of proteomic fingerprinting on species level in this taxonomic group. All specimens were correctly assigned to the different 27 species by a RF model for classification with only low sensitivity to data processing indicating high robustness of the method. Many of the included species e.g., Acartia spp. and Calanus spp. can either only be separated with time consuming morphological analyses, such as preparation of the fifth swimming leg, or by genetic analysis, such as the cryptic species of Pseudocalanus . Their reliable identification by proteomic fingerprinting again highlights the high value that MALDI-TOF MS could contribute to routine monitoring of marine communities, especially since most of the species investigated here are dominant at many North Atlantic monitoring sites.
In depth analysis of proteomic spectra revealed that no discrete species-markers exist, and that identification is rather based on complex pattern differences than on the presence or absence of single compounds. Although several peaks show high specificity for a species, their sensitivity remains too low to serve as a single marker. The uniqueness of compounds in a species becomes increasingly blurred as more species and more specimens from different regions and seasons are included in the analysis. This has strong implications for the applicability of the method in multi-species research. Since the mere presence of individual species-specific peaks cannot serve as an indicator for the presence of a species in a bulk sample, multiplexing of proteomic fingerprints similar to metabarcoding appears to remain unlikely. Therefore, future application of this method in zooplankton monitoring will likely focus more on quantitative approaches as part of a bug-by-bug strategy and thereby improve species resolution and identification of challenging taxa. Given that the time- and cost-effectiveness of MALDI-TOF is much better than, for example, DNA barcoding (Rossel et al., 2019, Renz et al., 2021), it is nevertheless a powerful tool to accelerate biodiversity assessment and facilitate early and timely detection of changes in communities and ecosystems.
It is remarkable that apparently no conservative homologous compound was detected in all specimens, and only three peaks were expressed in at least some specimens from all species. This variability in peak abundance may be subject to genotypic or natural physiology-related variability, or probably at least partly to methodological or sample quality-related variability. One essential step in data processing is a slight shift of m/z values to align potential homologous peaks during binning and to account for observed small mass deviations caused e.g. by the relatively short trajectory of the Biotyper. As peak alignment is not independent of peak-neighbors, universal compounds may therefore be hidden in the fuzziness of the method. However, as we detected more than 300 peaks with 100% intra-specific frequency, and as more than 70% of peaks were detected in more than 10 of the 27 species, we argue that most homologous peaks were correctly identified in most cases and assume that this methodological effect is unlikely the main reason for absence of universal peaks. No or only one or two peaks with 100% intra-specific frequency were found in those species with highest diversity in terms of included regions and environmental surroundings. This suggests that the observed variance in patterns is likely driven more by different genotypes or by phenotypic expression driven by physiological state.
Despite the absence of discrete peaks, species were reliably identified by RF based on several discriminant peaks. These were found in all mass ranges and intensities. In general peaks with higher m/z values were of lower intensity. Consistent with the findings on intra-specific peak frequencies, the importance of species-specific peaks in the model decreased with increasing sample size. The disconnection of peak intensity and peak importance for species identification has previously been observed in other taxa, e.g. insects (Dieme et al., 2014; Müller et al., 2013) and crustaceans (Paulus et al., 2022).