Connection Between Accuracy Metrics: MARE, R2, Spearman
In principle, the mean absolute relative errors in energies (MARE) consider both random and systematic errors of a method, while the R2 and Spearman correlation metrics remove systematic errors through linear correlation (R2) or ranking (Spearman ρ). However, for the comparisons here, there is a strong connection between all three metrics, as illustrated in Figure \ref{964988}. Methods with smaller MARE have almost a linear correlation with increased median R2. The three classical force field methods have essentially the same median R2 metric despite differences in MARE, likely due to systematic errors in the methods. Similarly, while increasing the data in the bag-of-features descriptors from BOB to BAT decreases the median MARE from 1.92 kcal/mol to 1.18 kcal/mol, the accuracy as judged by the median R2 remains essentially constant (0.31 and 0.32, respectively).