Table 1. Examples of open-source molecular simulation codes and related
supporting utilities developed within the chemical engineering molecular
modeling community. Website links are to home pages of the codes or to
code repositories. In addition, several other open-source codes emerging
from the chemical engineering community are highlighted.
In this regard, the molecular simulation community in chemical
engineering is particularly noted for sharing methods and capabilities
by making software developed within the community freely available under
open-source licenses, as described in a recent review
article30. Table 1 provides examples of open-source
molecular simulation tools developed within the ChEC, divided into
simulation codes and other utilities. Similar to GEMC, many of these
algorithms developed are primarily implemented within MC and hence it is
not surprising that the bulk of open-source simulation engines developed
within the ChEC (see Table 1) are for performing MC simulations. The
need for community-developed simulation engines, whether they are MD or
MC, stems from the fact that such codes have become increasingly
difficult to develop, extend, and maintain for a single individual or
single research group. This is due not only to an ever growing set of
features and algorithms, but also due to changes in computing hardware
utilized in a research environment: we have been through the era of
vector architectures (e.g., Cray, Hitachi), parallel vector computers (a
small number of coupled vector processors, such as Cray YMP), massively
parallel shared memory computers (MPP, such as the Intel Paragon, in
which a large number of the same commodity central processor units –
CPUs – used in deskside computers are linked together and communicate
over a communication network), multicore processors (such as Intel Xeon
that has gone from 6 cores to more than 50) both stand alone and as part
of an MPP, and more recently the inclusion of massively multicore
graphical processing units (GPUs, which have migrated from the gaming
industry into scientific computing and data manipulation). A modern
supercomputer typically consists of nodes, connected via an interconnect
(from vendors such as Mellanox and Intel)11These interconnects
can vary from standardized ethernet connections to more specialized,
proprietary high performance interconnects from various vendors. At
the time of writing, the current top 500 list includes numerous
systems with propriety interconnects such as Mellanox Infiniband (now
owned by Nvidia), Intel Omni-Path, Cray Aries, and Fujitsu Tofu along
with standard ethernet connections ranging from 10G to 100G.22,
where each node houses multiple commodity multicore CPUs and GPUs. This
is the dominant architecture of the supercomputers on the top 500 list
of the fastest computers in the world31, with the top
5 supercomputers having between 1.5 and in excess of 10.5 million total
computing cores at the time of writing; designing and maintaining
simulation codes that perform efficiently on these rapidly evolving
computer architectures is a significant challenge. Beyond community
developed simulation engines, we have also seen the rise of other
community developed utilities to support simulation, e.g., in the form
of general analysis packages as well as software that makes it easier to
accurately and reproducibility initialize configurations, apply force
fields to molecules, and create input files for a variety of simulation
engines.
In the remainder of this Perspective, as an example of ChEC open-source
software, we focus our discussion on the Molecular Simulation Design
Framework (MoSDeF), to which all the authors are contributors. MoSDeF is
a set of Python tools to facilitate the initialization and
parameterization of systems, with the goal of enabling transparent and
reproducible molecular simulation workflows that, at the same time, are
user-friendly and extensible.