Stability of proteomic signals and choice reference library
To evaluate the stability of proteomic signals of a species between
regions and seasons we compared mean intra- and inter-specific Euclidean
distances (Fig. 6). Intra-specific variability was lowest within samples
and increased when different regions or sampling seasons were included.
The species gap between the lower 10% quantile of inter-specific
distances and the 90% quantile of intra-specific distances was quite
prominent and large, when only specimens from single samples were
included. The threshold was around a Euclidean distance of 0.8. This
species gap strongly narrowed with increasing intra-specific variance in
multi-sample/season specimens and nearly closed, when specimens from all
regions were included.
To test whether the wide plasticity of proteomic profiles will be
relevant for species identification and the choice of reference library,
we determined species classification success of RF models, excluding
specimens from specific regions respectively (Tab. 1). No impact of the
reference library was observed for C. finmarchicus, C.
hyperboreus , M. longa and P. norvegica . Species
identification was specifically affected for specimens from the
Mediterranean (A. clausi ) and the Baltic Sea (A.
longiremis, C. hamatus, T. longicornis ). Strongest misidentification
was observed for C. typicus from the North Sea and A.
danae from the Central East Atlantic, with error rates of 0.9 and 0.8
respectively. Their rates of misidentification remained high even when
applying the post-hoc test for false positive discovery, in contrast to
all other species, where corrected error rates revealed all potential
misidentifications. While all C. typicus specimens from the North
Sea were identified as C. chierchiae , specimens of A.
danae from the Central East Atlantic were assigned to A.
negligens . The application of the post-hoc test resulted in overall
higher rejection rates of identification, with up to 100% rejection inN. minor . Mean Euclidean distances within regions varied from 0.4
to 0.7 and the maximum observed distance between specimens from
different regions between 0.7 and 0.9. The latter distances were in the
same range as the observed inter-specific distances (Fig. 6).
To evaluate variances between regions we compared the distances between
of the congener pairs A. clausi and A. longiremis ,T. stylifera and T. longicornis , C. hyperboreus andC. finmarchicus as well as C. typicus and C.hamatus (Fig. 7). Strongest homogeneity was observed for theCalanus species, only some specimens showed distances on the
inter-specific level, i.e. distances which can also been observed
between specimens of different species. However, also for Calanussome substructures on a regional level occurred, i.e. specimens from
different regions were less similar to each other. These were much more
distinct for A. longiremis , C. hamatus and T.
longicornis. Here, specimens differed in their proteomic spectrum
between regions nearly on inter-species level. For A. longiremisfour subclusters could be identified, including specimens from the
Baltic Sea, from Canada and the White Sea, from the Balsfjorden (Norway)
and from the waters around Iceland, respectively. Similarly, specimens
for the Baltic Sea formed strong subclusters in T. longicornis and
C. hamatus . North Sea specimens of C. hamatus showed two
sub-groups, one forming a distinct cluster and one clustering together
with animals from the White Sea.