3. Molecular Simulation Design Framework
(MoSDeF)
As shown in Figure 1, performing a molecular simulation, whether MD or
MC, requires multiple steps: building an initial configuration of the
system, selecting and applying a force field, generating a syntactically
correct input file (or files) for a target simulation engine,
equilibration (to relax the system from its initial configuration –
e.g., a crystal – to a configuration characteristic of equilibrium –
e.g., liquid), production run to generate a trajectory, and analysis of
the trajectory (e.g., averaging over the trajectory to compute
thermodynamic and/or structural properties, perform visualization,
etc.). Often reliability and statistics are improved by running multiple
independent trajectories using the same workflow. Accomplishing these
steps in a way that is both accurate and reproducible can be a
significant challenge. For example, the application of a force field is
a frequent source of error in simulations; for a system composed of
moderately complex molecules (such as an ionic liquid) the force field
can have a hundred or more parameters that must be provided, offering
multiple opportunities for errors (e.g., use of incompatible units, use
of parameter values from a publication containing a typographical error,
incorrect application of parameters due to logic errors or because of
ambiguous definition of parameter usage, etc.). While the use of a
community developed, open-source simulation engine may help to reduce
the likelihood of fundamental errors in algorithms underlying the
simulations, such codes cannot necessarily prevent users from providing
parameters that are inconsistent with the intended usage.
Typically, many of these steps are performed within a given research
group by a single graduate student, often making use of ad hoc,in-house software, even if open-source simulation engines are used. This
approach has several shortcomings that can make simulations more prone
to error, limit the extensibility, and hamper reproducibility. For
example, the various tools used to accomplish these steps may only be
loosely coupled and require manipulation, editing, and/or modification
of the tools and/or data by the user. This manipulation may introduce
errors and make it difficult to reproducibly capture the exact
procedures employed. The need for human manipulation may also limit the
ability to use such workflows in applications that require automation,
such as parameter screening studies or within the context of larger
workflows (e.g., to predict phase equilibrium within a process
simulator). The use of in-house software itself, which is typically not
open-source or freely available, creates numerous roadblocks as well.
Someone wishing to reproduce a simulation would be required to write
their own software to accomplish the same tasks. The development of such
software may be time consuming and publications often do not provide
sufficient detail regarding the procedures used to initialize and
parameterize simulations. Furthermore, without access to the original
source code, it is not possible to ascertain the quality of the
software; that is, to know whether it has undergone sufficient
validation or if there are errors and bugs that ultimately impact the
accuracy of the reported results.
The Mo lecular S imulation De sign F ramework
(MoSDeF)32 is designed to address these issues of
automation/efficiency, accuracy, and reproducibility in molecular
simulation. MoSDeF is an open-source Python library built upon the
scientific Python software stack with three major components: mBuild
(for constructing initial configurations of systems) and foyer (for
applying force fields). The third component, GMSO (General Molecular
Simulation Object), is currently under development and is designed to be
a general, flexible way of encapsulating the information required to
define a simulation topology in a simulation engine in an agnostic
manner. All of the capabilities of MoSDeF are scriptable, thus making
the tools inherently reproducible, as well as suitable for automated
calculations (e.g., screening). MoSDeF is implemented as a set of
composable/modular tools, where each “subpackage” (i.e., module) is
designed such that it can be used within MoSDeF, or as a standalone
package, allowing MoSDeF to more easily integrate with other community
efforts. This also allows the framework to be more easily modified,
tested, extended, and have fewer bugs than a monolithic approach.
Performing a simulation using MoSDeF, combined with dissemination of
simulation scripts on a service such as Github, enables a molecular
simulation to be published as a TRUE (t ransparent,r eproducible, u sable by others, and e xtensible)
simulation33.
MoSDeF has its origins in a decade of National Science Foundation
(NSF)-supported collaborative research at Vanderbilt University
involving researchers from chemical engineering and computer
science34–36, the latter affiliated with the
Institute for Software Integrated Systems (ISIS)37.
ISIS is a leading academic software engineering research center, and is
the originator of the concept of model-integrated computing
(MIC)38. MIC is a systems engineering approach that
focuses on the creation of domain specific modeling languages to capture
the essential features of the individual components of a given process,
at the level of abstraction that is appropriate for the end users. Due
to abstraction, processes are described at a meta level that allows
tasks to be coupled together to execute scientific or engineering
workflows. MIC has been deployed in applications as diverse as managing
auto assembly lines and processing health records. MIC design
principles, domain-specific modeling languages, and the general
philosophy of abstraction have shaped the development of MoSDeF. In
particular, MoSDeF attempts to be simulation-engine-agnostic, treating
the concept of a molecular simulation at a meta level, above the
specifics of the simulation engines. The tools within MoSDeF are
designed to fully describe a system: implementation relies on writers to
instantiate syntactically correct input files for specific engines from
this information. MoSDeF was initially developed to support several
commonly used open-source MD codes (LAMMPS39,
GROMACS40 and HOOMD-blue41) and has
since grown to support open-source MC simulation engines, namely
Cassandra16 and GOMC18. In the
Supplementary Information, we provide details on how to install MoSDeF
through various hosting systems (anaconda, docker, from source using
github, etc.) on Apple OSX, Linux, and Windows platforms. Below we
describe each of the three key components. Source code, tutorials,
documentation, and related publications can be accessed from mosdef.org
and/or github.com/mosdef-hub/.
3.1. mBuild