Stability of proteomic signals and choice reference library
To evaluate the stability of proteomic signals of a species between regions and seasons we compared mean intra- and inter-specific Euclidean distances (Fig. 6). Intra-specific variability was lowest within samples and increased when different regions or sampling seasons were included. The species gap between the lower 10% quantile of inter-specific distances and the 90% quantile of intra-specific distances was quite prominent and large, when only specimens from single samples were included. The threshold was around a Euclidean distance of 0.8. This species gap strongly narrowed with increasing intra-specific variance in multi-sample/season specimens and nearly closed, when specimens from all regions were included.
To test whether the wide plasticity of proteomic profiles will be relevant for species identification and the choice of reference library, we determined species classification success of RF models, excluding specimens from specific regions respectively (Tab. 1). No impact of the reference library was observed for C. finmarchicus, C. hyperboreus , M. longa and P. norvegica . Species identification was specifically affected for specimens from the Mediterranean (A. clausi ) and the Baltic Sea (A. longiremis, C. hamatus, T. longicornis ). Strongest misidentification was observed for C. typicus from the North Sea and A. danae from the Central East Atlantic, with error rates of 0.9 and 0.8 respectively. Their rates of misidentification remained high even when applying the post-hoc test for false positive discovery, in contrast to all other species, where corrected error rates revealed all potential misidentifications. While all C. typicus specimens from the North Sea were identified as C. chierchiae , specimens of A. danae from the Central East Atlantic were assigned to A. negligens . The application of the post-hoc test resulted in overall higher rejection rates of identification, with up to 100% rejection inN. minor . Mean Euclidean distances within regions varied from 0.4 to 0.7 and the maximum observed distance between specimens from different regions between 0.7 and 0.9. The latter distances were in the same range as the observed inter-specific distances (Fig. 6).
To evaluate variances between regions we compared the distances between of the congener pairs A. clausi and A. longiremis ,T. stylifera and T. longicornis , C. hyperboreus andC. finmarchicus as well as C. typicus and C.hamatus (Fig. 7). Strongest homogeneity was observed for theCalanus species, only some specimens showed distances on the inter-specific level, i.e. distances which can also been observed between specimens of different species. However, also for Calanussome substructures on a regional level occurred, i.e. specimens from different regions were less similar to each other. These were much more distinct for A. longiremis , C. hamatus and T. longicornis. Here, specimens differed in their proteomic spectrum between regions nearly on inter-species level. For A. longiremisfour subclusters could be identified, including specimens from the Baltic Sea, from Canada and the White Sea, from the Balsfjorden (Norway) and from the waters around Iceland, respectively. Similarly, specimens for the Baltic Sea formed strong subclusters in T. longicornis and C. hamatus . North Sea specimens of C. hamatus showed two sub-groups, one forming a distinct cluster and one clustering together with animals from the White Sea.