Non-coding sequence variants found in patients with NAGSD
We have identified three subjects with NAGSD and pathogenic sequence variants in the non-coding regions of the NAGS . Subject 1 is a compound heterozygote for two non-coding sequence variants in theNAGS gene: the NAGS: c.426+326G>A located in the first intron of the NAGS gene andNAGS: c.-3065A>T, located in the -3kb NAGSenhancer and adjacent to the previously identifiedNAGS :c-3064C>A pathogenic sequence variant (Heibel et al., 2012). Subject 2 is a compound heterozygote for sequence variants NAGS :c.427-218A>C, located in theNAGS intron 1, and NAGS: c.1494G>A or p.Trp498Ter in exon 7, leading to a premature termination of translation at codon 498 (Figure S1). Subject 3 is a homozygote for the sequence variant NAGS: c.-3098C>T (Figure S2), located in the -3kb NAGS enhancer (Heibel et al 2012).
Query of the gnomAD (Karczewski et al., 2020), dbSNP153 (Sherry et al., 2001) and 1000 Genomes Project (Genomes Project et al., 2015) databases indicated that none of the four non-coding sequence variants have been previously reported. The GERP (Cooper et al., 2010; Goode et al., 2010) and phyloP (Siepel et al., 2005) scores of the c.-3065A>T, c.427-218A>C and c.426+326G>A sequence variants indicated that they affected highly conserved base pairs while the base pair affected by the c.-3098C>T is not as conserved in mammalian genomes (Table 2). However, the phastCons (Siepel et al., 2005) scores of all four sequence variants indicated that they reside within conserved elements of the human genome (Table 2). Functional effect prediction programs Combined Annotation Dependent Depletion (CADD) (Kircher et al., 2014; Rentzsch, Witten, Cooper, Shendure, & Kircher, 2019) and MutationTaster2 (Schwarz, Rodelsperger, Schuelke, & Seelow, 2010) indicated that all four sequence variants should be disease causing.
Because phastCons scores of the two intronic sequence variants indicated that they reside in a region of high sequence conservation we used data mining approaches to further characterize this region (Caldovic, 2018). Query of the phastCons conservation track of the UCSC Genome Browser revealed a 200 bp conserved element with genomic coordinates chr17:42,082,680-42,082,799 (GRCh37/hg19 human genome assembly) in the first intron of the NAGS gene (Figure 1A). Moreover, based on its biochemical signatures this region was identified as a candidatecis -regulatory element by ENCODE data analysis center (Figures 1A-B and (Consortium, 2012)). The ENCODE Project database was also queried for transcription factors that bind to this region in the human liver. This revealed that the NAGS intronic element binds RXRα, HNF4α and Sp1 transcription factors (Figure 1B). Transcription factor Chip-Seq track of the UCSC Genome Browser was then used to locate more precisely the HNF4α binding site, while comparisons with the Sp1 and RXRα position similarity matrices from the JASPAR database of transcription factor binding sites (Fornes et al., 2020) were used to locate Sp1 and RXRα transcription factor binding sites (Figure 1C). The c.427-218A>C and c.426+326G>A sequence variants are located within the RXRα binding site and affect base pairs that are highly conserved in mammalian NAGS genes and almost invariant in its canonical recognition sequence (Yang, Subauste, & Koenig, 1995).