Data assembly: Infection data, niche modelling, phylogenies
We assembled infection data through a survey of peer-reviewed literature. This survey resulted in an updated version (Supporting Information) of the list published by Cruz-Laufer et al. (2021a). For abundance weighting, we also assembled infection parameters including the number of examined hosts, infected hosts, and parasites. Yet if no infection parameters were reported (59% of reports, 61% of interaction), we considered these reports singular observations to take them into account but minimise their impact on downstream analyses (eventually constituting 9.6% of infections).
We built host niche dendrograms based on ecological, geographical, and morphological data (Table 1) accessed on FishBase (Froese & Pauly 2000) through the R package rfishbase (Boettiger et al.2012). Missing trophic level and habitat data were added through a literature survey (see Supporting Information). Dendrograms were built through hierarchical clustering in R (Pavoine et al. 2009) based on a Gower’s distance matrix (Gower 1971). Gower’s distances were calculated using the function dist.ktab in the R packageade4 v1.7.16 (Pavoine et al. 2009). As in Clark & Clegg (2017), we accounted for uncertainty of the host niche by applying several clustering algorithms implemented in the function hclustin R (incl. ward.D2 , single , complete ,average , mcquitty, median , and centroid ) (R Core Team 2022). We tested for topological congruence of the resulting dendrograms using the congruence among distance matrices (CADM) test (Legendre & Lapointe 2004; Campbell et al. 2011) in the packageape v5.4 (Paradis & Schliep 2019).
As no previous phylogenetic study covers all the species known to host members of Cichlidogyrus , we conducted a new analysis (see Appendix S1.1) based on DNA sequence data accessed on GenBank (Appendix S2) to infer phylogenetic distances between hosts. For the parasites, we included morphometric and phylogenetic data from Cruz-Laufer et al. (2021b), i.e. morphological measurements and 100 randomly sampled Bayesian tree topologies from the post-burn in fraction.