The use of software containers has revolutionized how services are deployed on the internet, and science is beginning to reap some of these rewards. At their core containers, such as Docker, Singularity, and Shifter, are self-contained binary environments that contain everything from Linux system calls throughout the full call stack to software libraries and platforms. This enables the development of a complete build environment from a known base image, through to packages that should be installed, software repositories to clone, build, and install.
The software images have an entry point that can be called from the host operating system, and a flexible set of capabilities to mount file systems, get data in and out, etc. The use of a Dockerfile enables the full description of the base image, build steps, and configuration. The binary images can be uploaded to DockerHub and similar services...
Quantum Chemistry Codes (Alessandro, Muammar and Marcus)
Thanks to the container infrastructure described above, it is very easy to extend the platform to be able to perform calculations using the simulation packages. All that is needed is to create a docker image that conforms with the prescribed interface, and the platform will be able to use it seamlessly, without requiring additional changes.
We have developed two images of popular Quantum Chemistry codes: NWChem and Psi4. These images also serve as an example for other developers interested in getting different codes to run through the platform. Both the NWChem and Psi4 support a similar subset of features:
- Single point / geometry optimization / frequency calculations.
- Basis set selection.
- Theory level (Hartree Fock or DFT) and exchange-correlation functional selection.
- Charge and multiplicity.
While this is a small subset of what these quantum chemistry codes can do, we are planning on extending the features exposed by the image in the future, prioritizing those features requested by the users of the platform.
Machine Learning (Johannes and Alessandro)
From the point of view of the platform, there is no difference between running a quantum chemistry or a machine learning code. They both take as input a molecule and some parameters, and return a conforming output. The only difference is that machine learning code may not require the existence of a 3D structure, and can still operate when only string based formats such as InChI or SMILES are available.
We have developed two images for codes that use machine learning to predict the result of a calculation: ANI and ChemML. For the ANI image we have used the Pytorch implementation of the ANI potentials - TorchANI - and use it as an ASE calculator. We then simply leverage the algorithms in ASE to drive task such as geometry optimizations and normal modes calculations. The TorchANI image features the ani-1x and ani-1ccx optimized potentials that can be used to generate single point energies, perform geometry optimizations and compute Hessians for frequency calculations.
ChemML is a machine learning Python package for the analysis and modeling of chemical and materials data. We have been pursuing initial steps to integrate the ChemML package into the OpenChemistry platform. In addition to adding ML functionality to OpenChemistry, we aim to make these techniques more accessible and advance their broader dissemination in the chemistry community. The focus of the current work has been to compile a collection of trained ML prediction models for certain materials properties that can be used as alternatives to corresponding physics-based modeling or simulation approaches. As proof of principle, we designed and implemented a deep learning model for the prediction of the refractive index values of organic compounds.
A dynamic figure? Any other pieces from below that could go in here?
The model takes as input the SMILES string of an organic molecule, and returns as output a few quantities of interest that the model has been trained on. The predictions are available almost instantaneously (very much unlike for a physics-based model via the Lorentz-Lorenz equation parametrized by inputs from quantum chemistry and molecular dynamics calculations). The results of this ML model are comparable with those of other data-derived prediction models in terms of diversity of molecular candidates and the accuracy of predictions. The current implementation also enables the user to retrain each model for a better or more generalizable prediction power. Moreover, a trained model can leverage other relevant ML models through the concept of transfer learning design methodologies.
Our near-term plans for future development include the addition of models trained on other data sets and for other material properties. Subsequently, we will tackle the full integration of ChemML beyond the use of trained models.
Generating and accessing data through the web
Needs intro paragraph
Open Chemistry Web Widgets (Alessandro)
We have created a set of reusable widgets that can be embedded in any web environment, from a React/Vue/Angular single page app, to a JupyterLab extension, to a static HTML page. These widgets are written in TypeScript using Stencil, and upon compilation, they become standard web components (Custom Elements V1) that can be used just like any other HTML tag such as a <div/>
or a <img/>
.
The core widgets are the <oc-molecule-moljs/>
and <oc-molecule-vtkjs/>
. These two widgets have a common interface and can be used interchangeably. They take as input a cjson
object and a set of parameters to tweak the visualization (such as the isosurface value, the active normal mode, or the ball/stick sizes) and draw on the screen a three dimensional representation of the molecule that the user can interact with. The difference between them is that <oc-molecule-moljs/>
uses 3Dmol.js
to render the 3D scene, while <oc-molecule-vtkjs/>
usesvtk.js
.
Features:
- Display the molecular structure and connectivity
- Display the crystal unit cell (vtkjs only)
- Play normal modes animation
- Display isosurfaces of cube data
- Volume rendering of cube data (vtkjs only)
Another important widget we developed is the high level <oc-molecule/>
. This widget wraps the core <oc-molecule-moljs/>
and <oc-molecule-vtkjs/>
and adds a menu so that the user can easily change the visualization parameters interactively:
- Change structure representation (Ball and stick / vdW / custom)
- Change the render backend (moljs / vtkjs)
- Change the currently active normal mode
- Change the isosurface value
- Switch the color map used by the volume renderer
- Tweak the color transfer function used by the volume renderer