Figure Legends
Figure 1: Overview on included regions, NWA: North-West Atlantic, CAN:
Canada, ICE: Icelandic waters, CEA: Central-East Atlantic, MED:
Mediterranean, NOS: North Sea, CBS: Central Baltic Sea, NWS: Norwegian
Sea, WHS: White Sea, BAR: Barents Sea, OFJ: Oslofjord, GFJ:
Gullmarsfjord, BFJ: Balsfjord (also see supplement table 1)
Figure 2: left panel: impact of peak detection parameters (SNR = signal
to noise ratio threshold for peak picking, influencing resolution on
intensity axis and HWS = half window size of peak picking algorithm,
influencing resolution on m/z axis) on species classification success of
the random forest model, right panel: impact of SNR on number of peaks
Figure 3: A : number of peaks, grouped by number of species with
these peaks in common, B : number of peaks, grouped by
intra-specific frequency, C : peak intensity as boxplot (without
outliers) for all peaks and for peaks with 100% intra-specific
frequency, D : max. and mean intra-specific peak frequency of
the 315 potential single-markers in other species (100% intra-specific
frequency in only one species)
Figure 4: Heatmap of 170 most important peaks for the species
classification random forest model (peaks with maximum of class-specific
mean decrease in accuracy of >0.015 are presented);
clustering of species is based on hierarchical clustering (average
linkage) of the species-mean Euclidean distance based on the whole peak
spectrum, the annotation gives maximum peak intensity of the given m/z
peak over the whole dataset; heatmap scaling: 0-0.1 class-specific mean
decrease in accuracy, peak intensity scaling: 1-7*10-3 arbitrary unit),
species included in this analysis: Acartia bifilosa (Abif), A.
clausi (Acla), A. danae (Adan), A. negligens (Aneg), A. longiremis
(Alon), A. tonsa (Aton), Calanus finmarchicus (Cfin), C. helgolandicus
(Chel), C. glacialis (Cgla), C. hyperboreus (Chyp), Centropages bradyi
(Cbra) , C. typicus (Ctyp), C. hamatus (Cham), C. chierchiae (Cchi),
Metridia longa (Mlon), M. lucens (Mluc), Pseudocalanus elongatus (Pelo),
P. moultoni (Pmou), Temora longicornis (Tlon), T. stylifera (Tsty),
Paraeuchaeta norvegica (Pnor), Microcalanus sp. (Mcal), Anomalocera
patersonii (Apat), Nannocalanus minor (Nmin), Eurytemora affinis (Eaff),
Limnocalanus macrurus (Lmac), Corycaeus anglicus (Cang)
Figure 5: Principal Coordinates Analysis (PCoA) on proteomic spectra of
congener species, species included: Acartia bifilosa (Abif), A.
clausi (Acla), A. danae (Adan), A. negligens (Aneg), A. longiremis
(Alon), A. tonsa (Aton), Calanus finmarchicus (Cfin), C. helgolandicus
(Chel), C. glacialis (Cgla), C. hyperboreus (Chyp), Centropages bradyi
(Cbra), C. kroyeri (Ckro), C. typicus (Ctyp), C. hamatus (Cham), C.
chierchiae (Cchi), Metridia longa (Mlon), M. lucens (Mluc),
Pseudocalanus elongatus (Pelo), P. moultoni (Pmou), Temora longicornis
(Tlon), T. stylifera (Tsty)
Figure 6: Boxplots based on species-specific means (upper panel) and 10
or 90% quantiles (lower panel) of Euclidean distances, providing
inter-specific distances and intra-specific distances based on specimen
from different regions, from only the same region and the same sample
respectively
Figure 7: Heatmaps of Euclidean distance based on the proteomic spectrum
of specimens from different regions (annotation, abbreviations see Fig.
1), hierarchical clustering with average linkage, congener pairs
included: Acartia clausi and A. longiremis,
Centr opages typicus and C. hamatus, Temora stylifera, andT. longicornis, Calanus hyperboreus and C. finmarchicus