Species distribution modelling
We modelled the breeding range of L. c. superciliosus and projected it into the LGM. Data on occurrence in Japan were obtained from a summary of breeding localities based on the National Surveys conducted by the Ministry of Environment of Japan and literature (M Kitazawa and his colleagues (in prep.) hereafter referred to as ‘review data’). Additionally, on 12 May 2019 we downloaded data from Global Biodiversity Information Facility (GBIF; https://www.gbif.org/, http://doi.org/10.15468/dl.igj5uw). A polygon bounding the presumed distribution of L. c. superciliosus according to Lefranc & Worfolk (1997) was used to clip the data. Records were limited only to the breeding season (May to August) to avoid integration of migrating birds. After a few quality-checking processes, final coordinates were thinned to include only one occurrence per 20 km square covering the Japanese archipelago (Figure S1.1).
We constructed a model with the final coordinate data. Our purpose is to infer the continental area that might have been suitable for L. c. superciliosus as its breeding habitat during the LGM. Therefore, any ecological differences between the current archipelago and the continental area need to be integrated in the model (Elith et al., 2011). We clipped the environmental data to include the present distribution of the Brown Shrike regardless of subspecies and this environmental data was defined as the modelling domain. Of the available WorldClim data (version 1.4; Hijmans et al. 2005), only one variable from each pair of correlated variables (|r| > 0.7) was used, resulting in seven variables: bio1, 2, 3, 5, 12, 14 and 15.
We used the R package ‘ENMeval’ v. 0.3.0 (Muscarella et al., 2014) to execute the maximum entropy (MaxEnt) algorithm (Phillips, Anderson, & Schapire, 2006) across a range of several settings. Combinations of three feature classes (linear, a combination of linear and quadratic, and hinge) and regularization multipliers ranging from 0.5 to 3 (increments of 0.5) were tested. The ‘maxent.jar’ option was used as the model algorithm. From the geographical modelling domain defined above, 10,000 background points were sampled randomly. The ‘block’ method was used to partition data for cross-validation since it assesses model transferability better than other methods (Muscarella et al., 2014). The hinge feature, with a regularization multiplier of 1.5, was selected by the lowest AICc. The present distribution was predicted, and then it was projected to the LGM environmental data using cloglog transformation. The LGM environmental data were drawn by Hijmans et al. (2005) from three different climate models: the community climate system model (CCSM, Gent et al., 2011), the model for interdisciplinary research on climate model (MIROC-ESM, Hasumi & Emori, 2004) and the Max-Planck-Inst. Für Meteorologie climate model (MPI-ESM-P, Liepert & Lo, 2013). Multivariate environmental similarity surface (MESS) was also calculated and extrapolation was restricted to regions indicated by negative values from this analysis (See Appendix S1 for the detailed procedures on the SDM analysis).
Migratory route analysis
Data retrieved from light-level geolocators were analysed in the R package ‘GeoLight’ version 2.0.0 (Lisovski & Hahn, 2012) and ‘SGAT’ version 0.1.3 (Wotherspoon, Sumner, & Lisovski, 2015). We prepared an R script by following the manual provided by Lisovski et al. (2019). We determined sunset and sunrise transitions for specific dates by light intensity thresholds of 0.5. We incorporated a statistical modelling approach, called the ‘group model’ in a Bayesian framework implemented in ‘SGAT’. This model refines the estimated locations by means of three components: the twilight model, a movement model, and spatial masks. The twilight model incorporates the deviation between determined and actual twilight times in calibration periods of each tag. We used twilight data of 26 days after geolocator deployment as the calibration period. We fitted gamma distribution as an error distribution of detection of twilight. To define a movement model and spatial masks separately for stationary and movement periods, the ChangeLight function in ‘GeoLight’ was applied to discriminate between these periods from the twilight data. A probability threshold for discrimination was set to 0.9. For the movement model, we set a flight speed prior with a gamma distribution for periods of movement (shape = 13, rate = 0.3), determined allowing for the average air speed of 12.9 m/s for Lanius collurio(Bruderer & Boldt, 2001). We conducted a sensitivity analysis for the gamma distribution, and we found that priors did not greatly influence our interpretation of the results (Table S1.2). In the group model, twilight times grouped for a stationary period are treated as records at a single location, thus no movement model was defined. For a probability mask, birds must cross the sea between the Japanese archipelago and the continent and must stopover on land. Therefore, a spatial mask for stationary periods was designed only to include terrestrial land while both sea and land areas were allowed for movement periods. We slightly modified our analysis for bird #V5604-025 because its geolocator stopped recording during the autumn equinox period (see Appendix S1 for details).
We interpolated latitudinal estimations around the equinox periods by setting a ‘tol’ value between 0.08 and 0.1. As burn-in and fine tuning, we ran 120,000 chains with three iterations, and the final paths were determined by the last 15,000 samples with three iterations. Regarding large uncertainties involved in migration estimations by geolocators (Lisovski et al., 2019), we standardized the posterior probabilities of location estimates on each raster cell, calculated their means among the three individuals, and we presented them as a probability distribution in a region of interest where we expect birds to retrace the route of past colonisation. We also inferred median paths of the three migratory tracks. Because birds spend more time at stopover sites than at an instantaneous point along a migratory path, the former will be indicated with relatively higher probabilities than the latter.
Results: