Exact mass GC-MS database for routine analysis
The present curated database contains 336 compounds, 234 of them being identified and quantified in Arabidopsis leaves. This might appear relatively small compared with the total estimated number of small metabolites (several thousands) in plants. However, this compares well with most targeted routine GC-MS analyses for metabolic profiling, which yield a list of about 80-100 metabolites in the vast majority of cases (for example, there are 162 metabolites in (Cui, Davanture, et al., 2019), and 178 in (Cui et al., 2021) found in leaves). Of course, the ability of instruments and softwares to extract a proper dataset from raw data using the database depends on the quality of analyses. In effect, despite the considerable dynamic range of modern instruments (here, 6 orders of magnitude in peak height), precise quantification can only be carried out when analytes are not too concentrated (inadequate peak shape do not allow peak extraction by softwares like Tracefinder®) (Kaufmann & Walker, 2017). This can be challenging when some metabolites are present in high amounts (e.g. sucrose or proline) while others are present in trace amounts or generate a weak signal (e.g. salicylamide) (Fig. 6). It should be noted that data extraction from raw data can also be processed via untargeted peak searching, providing a much more powerful way to appreciate the diversity of molecules present in extracts (Perez de Souza et al., 2019). However, this has two drawbacks: (i ) processing time is very long (at least 20 times slower with Tracefinder®), and (ii ) many peaks would appear as unidentified, with only the m/z value and retention time (and thus post-hoc identification is required using exact mass and potentially, co-occurring fragments). Therefore, for routine analyses, it is probably more convenient to rely on targeted analyses with the database we propose here.