Sequencing and assembling Macquarie perch and golden perch genomes
For Macquarie perch, we used fin, tail bone and muscle tissues of a 2-month-old hatchery-produced juvenile of unknown sex of Yarra River origin, born in November 2012 (sample ID MP_SCH12). For the golden perch genome, we used fin tissue collected non-lethally from adult of unknown sex (aged as 3+ years based on size), captured in the Broken River, Victoria, in May 2017 (sample ID GOP001). For both species, DNA samples were preserved in ethanol and kept at -20°C.
DNA was extracted using Qiagen DNeasy Blood & Tissue kits. For short-read sequencing, 100 ng of gDNA was fragmented to 350 bp using QSonica and processed with a NEB Ultra Illumina Library Preparation Kit. The libraries were pooled with libraries for other projects and sequenced on all four lanes of S4 flowcell of a Novaseq6000 at the Deakin Genomics Centre using 2 × 151 bp run configuration, with the aim of obtaining 100 Gb of data per sample (Appendix A). To obtain long-read data, 1 µg of gDNA was fragmented to 8 kb using a Covaris G-Tube and processed with a LSK108 library preparation kit according to the manufacturer’s instructions (Oxford Nanopore, UK). The library was subsequently sequenced on a Nanopore R9.4 flowcell. Base-calling of the Nanopore signal used Albacore v2.0.1 (Oxford Nanopore, UK).
Illumina reads, adapter-trimmed using fastp v0.19.5 (Chen, Zhou, Chen, & Gu, 2018), and Nanopore long reads were hybrid-assembled de novo using MaSuRCA v3.2.4 (Zimin et al., 2017). The short Illumina reads were first error-corrected with QuORUM as implemented in the MaSuRCA pipeline and subsequently used to construct contigs by the de Bruijn graph approach. These contigs were used to error-correct the Nanopore long reads, generating “mega read” contigs for Overlap-Layout-Consensus assembly. Genome completeness was assessed using BUSCO v4 (Seppey, Manni, & Zdobnov, 2019) with default setting, based on the actinopterygii_odb10 database.