Referee Report
As I mentioned in the abstract of this reviewer form, this manuscript is suitable for IJQC. However, I have to raise major and minor points for a revision, which are listed below:
Major points:
1) Multiple times (particularly in the abstract and introduction), the author alludes to an ongoing discussion in the DFT community that is biased towards seeing DFT methods that rely on a large number of fitted parameters as non-transferable to different systems compared to methods that rely on a smaller number of parameters. At times, the statements made about this seem rather anecdotal, as they lack any scientific citation. Some examples are:
“Counting parameters has become customary in the density functional theory community as a way to infer the transferability of popular approximations to the exchange–correlation functionals.”
“Among this latter school of thoughts, fitting xc functionals has achieved a somehow bad reputation, because parameters have been associated to overfitting and poor transferability”
“A corresponding frequently asked question in the DFT community is: “how many parameters does this functional have?”, implying that functionals with more than four or five fitted parameters are barely useful elephants in the DFT zoo. “
“Instead of playing the game of counting fitted parameters in “parametrized functionals” and compare them to hidden parameters in “zero-parameter” functionals, …”
“The fight between counting parameters and analytical fits, …”
The author needs to provide some references to support his statements. In addition, very strong wording is used in the article that in my opinion is not appropriate for scientific publications and seems a bit exaggerated, such as “treacherous”, “war”, “fight”. I advise to adjust that type of language.
2) A large portion of the article, namely Section 2, feels very disconnected to the main core of this work in Section 3, which answers the main question of this article and provides a statistical analysis based on calculated data.
Instead, the article fits a function with a single parameter to reproduce enhancement factors of popular DFT methods. It turns out that that parameter needs to rely on an immensely large number of decimals to achieve appropriate reproduction of the training function. Maybe I misunderstood this section, but while it looks like an interesting exercise, I do not see the relevance and the scientific insight. Expressing an enhancement factor of a complicated functional—which by itself may depend on a large number of parameters—through a fit function with one parameter does not really mean that that functional has now become dependent on just one parameter. Ultimately, the enhancement factor it was fitted against had been developed with multiple parameters. Regardless of whether the number of parameters in that original enhancement factor causes problems in applications or not, the new single-parameter fit still inherits the virtues and problems of the original method. In addition, the large number of decimals also makes me question how easy it is to use that parameter on a different machine and to reproduce exactly the same function. Is it not possible that some of those decimals already contain numerical noise? Finally, I called this section “disconnected” because the final important part of the manuscript never refers back to Section 2.
My recommendation is to shorten Section 2 and to mention it as an interesting “fun fact” with most information being moved to a Supporting Information.
Minor points:
3) Table 1 presents a ranking of functionals and supports the author’s main point, namely that the number of degrees of freedom does not tell us anything about a method’s applicability. The ranked functionals closely resemble previous recommendations based on traditional benchmarking, which complements those previous works nicely. The database used by the author relies heavily on existing databases and benchmark sets, with most of them also addressing noncovalent interactions. Indeed, one can see an improvement when some form of London dispersion correction is used. Therefore, I feel that some functionals are misrepresented, in particular methods such as revTPSS, M11, N12, M06 etc. Dispersion corrections for those methods have been put forward and they have been shown to work. This is also the case for Truhlar’s methods, as shown by different groups, e.g. Goerigk in J. Phys Chem. Lett in 2015\cite{Goerigk_2015}, or Head-Gordon et al in JCTC in 2016. I recommend to also show results for those dispersion-corrected versions. In fact, including those numbers will do the author a favour. The number of parameters will increase, but the results improve, further evidence supporting this paper’s main message.
4) Given that this is a full paper without page limitations, I feel it is not appropriate to refer to the supporting information of a previous paper for references of functionals or benchmark sets. That would be a disfavour to all the authors that developed and wrote those articles. They should be cited in this paper too.
5) I also find that referring to Morgante 2019\cite{Morgante_2019} for technical details is not appropriate. The readers do not want to look up a different paper for important information.
6) Where the author cites previous benchmark studies by Goerigk et al and Head-Gordon et al., he may also want to include Jan Martin’s latest contributions published in the recently published Journal of Physical Chemistry Festschrift in honour of Leo Radom\cite{Santra_2019} and in the Israel Journal of Chemistry\cite{Martin_2019}.
In summary, while this list may look long, I liked reading this work and I am certain that it will be well received. I hope my comments can assist the author to improve it further.