2.3 Data processing methods
After basecalling with Albacore (version 2.3.4) the MinION reads were
demultiplexed (with Epi2me). The total yield for LBA9402 was 298,712
reads, totaling 1,027,720,149 bp, with a mean read length of 3441 bp.
Nanopore reads were end-trimmed and filtered on average quality
(>Q10) and length
(>5000 bp) with NanoFilt (64-fold coverage after
filtering). A total of 4,518,191 99-nucleotide paired-end Illumina reads
were quality and adapter trimmed using Cutadapt (70-fold coverage).
Hybrid assembly was performed
using Unicycler version 0.4.7. Besides three contigs representing the
two chromosomes and the Ri plasmid a fourth contig of 5386 bp was
identified. This represented the bacteriophage ΦX174 genome sequence,
which is spiked-in at low concentration during Illumina library
preparation. This contig was therefore removed from the assembly. The
assembly was annotated using NCBI Prokaryotic Genome Annotation Pipeline
(PGAP). In addition, PHASTER was used to annotate prophage sequences
(Arndt et al. 2016). For the functional characterization of the encoded
proteins eggNOG-Mapper was employed (Huerta-Cepas et al. 2017).
Insertion elements (IS elements) were identified using ISEScan (Xie and
Tang 2017). In figure 2 and supplementary figure S5, only complete
insertion sequences, i.e. including inverted repeats, are shown.
IslandViewer was used to predict genomic islands (Dhillon et al. 2015)
and CGView was used to generate a circular map of pRi1855 (Stothard and
Wishart 2005). Mauve (progressiveMauve) was used to align the LBA9402
and K84 genomes (Darling et al. 2010). BRIG was used to compare pRi1855
with other Ri and Ti plasmids (with BLASTN, e-value cut-off 1e-10) and
to visualize the hits in concentric rings (Alikhan et al. 2011). For the
comparisons between erythritol catabolism regions and between pRi-1855
and Rhizobium lusitanum strain 629, BLASTn was run locally with
BLAST version 2.9.0+. Protein alignments were performed with MAFFT
version 7.471, L-INS-I method (Katoh 2013) and visualized with Jalview
version 2.11.1.2 (Waterhouse et al. 2009) and Adobe Illustrator.
Percentage identities as shown in Table 1 were calculated with the R
package seqinr.
2.4 Data availability
The complete genome sequence of R. rhizogenes LBA9402 was
deposited in GenBank under accession numbers CP044122, CP044123 and
CP044124. The raw reads are deposited in the Sequence Read Archive under
accessions numbers SRR10177303 and SRR10177304.
Results