Processing of genomic data and SNP calling for P. oryzaeisolates
As for rice genomic data, we used Toggle to implement a pipeline for raw reads processing, mapping and SNP calling. Raw reads were trimmed to remove barcodes, adapters and ambiguous base calls. Trimmed reads were mapped against reference genome 70-15 version 8 (R. A. Dean et al., 2005) using BWA with option –n 5 for sub-command aln and option –a 500 for paired-end analyses sub-command sampe . The alignments were sorted with PICARDTOOLSSORTSAM and SAMTOOLSVIEW (http://broadinstitute.github.io/picard/, Li 2011). Intervals to target for local realignment were defined using Realignertargetcreator, and local realignment of reads around indels were performed with Indelrealigner. Duplicates were removed with Markduplicates. SNPs were then called using the UnifiedGenotyper tool in GATK, while keeping all sites of the reference genome using the option Emit_all_sites. High-confidence SNPs were identified using GATK’s variantfiltration option with the following parameters: MQ0< 3.0 (total mapping quality zero reads), depth ≥ 15.0 (number of reference alleles + number of alternative alleles, computed as the sum of allelic depths for the reference and alternative alleles in the order listed), and RA ≤ 0.1 (number of reference alleles / number of alternative alleles).