Non-coding sequence variants found in patients with
NAGSD
We have identified three subjects with NAGSD and pathogenic sequence
variants in the non-coding regions of the NAGS . Subject 1 is a
compound heterozygote for two non-coding sequence variants in theNAGS gene: the NAGS: c.426+326G>A located in
the first intron of the NAGS gene andNAGS: c.-3065A>T, located in the -3kb NAGSenhancer and adjacent to the previously identifiedNAGS :c-3064C>A pathogenic sequence variant (Heibel
et al., 2012). Subject 2 is a compound heterozygote for sequence
variants NAGS :c.427-218A>C, located in theNAGS intron 1, and NAGS: c.1494G>A or
p.Trp498Ter in exon 7, leading to a premature termination of translation
at codon 498 (Figure S1). Subject 3 is a homozygote for the sequence
variant NAGS: c.-3098C>T (Figure S2), located in the
-3kb NAGS enhancer (Heibel et al 2012).
Query of the gnomAD (Karczewski et al., 2020), dbSNP153 (Sherry et al.,
2001) and 1000 Genomes Project (Genomes Project et al., 2015) databases
indicated that none of the four non-coding sequence variants have been
previously reported. The GERP (Cooper et al., 2010; Goode et al., 2010)
and phyloP (Siepel et al., 2005) scores of the c.-3065A>T,
c.427-218A>C and c.426+326G>A sequence
variants indicated that they affected highly conserved base pairs while
the base pair affected by the c.-3098C>T is not as
conserved in mammalian genomes (Table 2). However, the phastCons (Siepel
et al., 2005) scores of all four sequence variants indicated that they
reside within conserved elements of the human genome (Table 2).
Functional effect prediction programs Combined Annotation Dependent
Depletion (CADD) (Kircher et al., 2014; Rentzsch, Witten, Cooper,
Shendure, & Kircher, 2019) and MutationTaster2 (Schwarz, Rodelsperger,
Schuelke, & Seelow, 2010) indicated that all four sequence variants
should be disease causing.
Because phastCons scores of the two intronic sequence variants indicated
that they reside in a region of high sequence conservation we used data
mining approaches to further characterize this region (Caldovic, 2018).
Query of the phastCons conservation track of the UCSC Genome Browser
revealed a 200 bp conserved element with genomic coordinates
chr17:42,082,680-42,082,799 (GRCh37/hg19 human genome assembly) in the
first intron of the NAGS gene (Figure 1A). Moreover, based on its
biochemical signatures this region was identified as a candidatecis -regulatory element by ENCODE data analysis center (Figures
1A-B and (Consortium, 2012)). The ENCODE Project database was also
queried for transcription factors that bind to this region in the human
liver. This revealed that the NAGS intronic element binds RXRα,
HNF4α and Sp1 transcription factors (Figure 1B). Transcription factor
Chip-Seq track of the UCSC Genome Browser was then used to locate more
precisely the HNF4α binding site, while comparisons with the Sp1 and
RXRα position similarity matrices from the JASPAR database of
transcription factor binding sites (Fornes et al., 2020) were used to
locate Sp1 and RXRα transcription factor binding sites (Figure 1C). The
c.427-218A>C and c.426+326G>A sequence
variants are located within the RXRα binding site and affect base pairs
that are highly conserved in mammalian NAGS genes and almost
invariant in its canonical recognition sequence (Yang, Subauste, &
Koenig, 1995).