Figure Legends
FIGURE 1 Small Open Reading Frames (sORFs) and RNA. Box: Within mRNA
that encodes canonical protein coding sequences (CDS), sORFs can appear
in the 5′ UTR (upstream ORF, uORF), initiating in the 5′ UTR and
extending into the CDS in an alternative reading frame (upstream
overlapping ORF, u.oORF), in the 3′ UTR (downstream ORF, dORF), or
nested within the CDS in an alternative reading frame. sORFs can also be
found in long noncoding RNA (lncRNA, bottom) and circular RNA (circRNA,
right), as well as additional classes of RNA not pictured.
FIGURE 2 Alternative Reading Frames for Same-Strand Overlapping (Nested)
sORFs. The +1 reading frame corresponds to the canonical coding sequence
and is always the frame of reference. Frameshifted translation in the +2
or +3 reading frames generates protein products with completely
different amino acid sequences because the codon identities are changed
in alternative reading frames.
FIGURE 3 Mass Spectrometry Workflow for Detection of Unannotated
Microproteins. To search for novel microproteins in a sample of
interest, low molecular weight proteins are isolated from total protein
after cell lysis. Size-exclusion techniques include, but are not limited
to, solid-phase extraction and polyacrylamide gel electrophoresis
techniques. Low molecular weight protein is digested with a protease,
producing a sample of uniform peptide length appropriate for mass
spectrometric (MS) analysis. Experimental spectra are generated and
matched to theoretical spectra from a custom database using proteomics
software. Detection of annotated microproteins known to be expressed in
the system of interest can serve as a positive control for success of
small protein enrichment and known small proteome coverage, but these
spectra are otherwise computationally excluded. Peptides deriving from
proteolysis of canonical proteins before size-exclusion are
computationally identified and excluded from consideration. High scoring
experimental spectra without any matches to known microproteins can be
subjected to further molecular validation, leading to annotation of
novel microproteins.
FIGURE 4 Experimentally Determined Microprotein Structures. (A) Crystal
structure of AcrB (grayscale) of the tolC efflux pump in complex
with microprotein AcrZ (cyan). PDB: 5NC5. (B) Cryo-EM structure of
bacterial microprotein CydX (cyan) in complex with transmembrane
cytochrome bd-I oxidase (grayscale). PDB: 6RKO. (C) Crystal structure of
SERCA1a calcium pump (grayscale) with bound single-pass transmembrane
microprotein phospholamban (cyan), which downregulates SERCA activity.
PDB: 4Y3U. Solid-state NMR structure of helix-loop-helix microprotein
DWORF (cyan) modeled into SERCA1a calcium pump (grayscale) based on
Venkateswara et al. 2022. PDB: 4Y3U, 7MPA. (D) NMR structure of
wild-type humanin in 30% 2,2,2-trifluoroethanol (organic) solution.
PBD: 1Y32. (E) Crystal structure of Ubiquitin monomer. PDB: 1AAR. (F)
Crystal Structure of ubiquitin-like TINCR microprotein with additional
N-terminal alpha helix. PDB: 7MRJ. (G) Predicted structure of bacterial
microprotein YmcF generated with AlphaFold, obtained from
UniProt[166] (green). Five cysteines (orange) in the YmcF sequence
are predicted to form a zinc-finger domain common to RNA binding
proteins. (H) Predicted structure of PAQosome binding microprotein
ASDURF generated with AlphaFold, obtained from UniProt[166].