The System Architecture
IBM Visual Insights consists of hardware, resource management, deep learning computation, service management, and application service layers. The infrastructure layer includes the actual hardware needed to run the tools, such as CPUs, GPUs, storage, and network. The resource management layer is responsible for coordinating and scheduling all these resources to carry out a particular sequence of operations. The deep learning calculation layer includes the implementation of actual DL algorithms as well as data processing, model, and prediction modules. DL models implemented in this layer include GoogLeNet for image classification, Faster R-CNN, tiny YOLO V2, Detectron, Single Shot Detector (SSD) for object detection, and Structured Segment Network (SSN) for action detection. Custom models can also be imported. The service management layer enables user project management via a graphical interface and the application service layer is responsible for managing application-related services built on top of other layers.
IBM Visual Insights runs as a collection of pods in a Kubernetes environment (a pod is a group of containers with shared storage and network resources that are created and managed together). The IBM Visual Insights stand-alone deployment version 1.2.0 used here consists of 20 Docker images. These images are used by pods that provide Kubernetes infrastructure to run the IBM Visual Insights and pods to run the actual IBM Visual Insights applications.
Of course users do not need to be aware of these details, the entry point for them is just a web link to the web-based GUI through which a model can be selected and trained. Once logged into the interface, the user can upload data (images and videos, including annotated Common Objects in Context, or COCO, datasets), label them, and train a model (classification, object detection, and action recognition models are currently supported). The example application described below will walk step-by-step through the process of training a classification model. Once the model is trained, it can be deployed for production use through a variety of tools, including REST APIs and a mobile application.
For this article, our instance of IBM Visual Insights runs on an IBM 8335-GTH AC922 server \cite{introduction}. This is principally the same architecture used in Summit and Sierra supercomputers. The server contains two 20-core 2.4 GHz IBM POWER9 CPUs, 256 GB DDR4 RAM, and four NVIDIA V100 GPUs with 16 GB HBM2 memory each. As of version 1.2.0, IBM Visual Insights supports the x86 platform as well.
Example Application
We use classification of COVID-19 chest X-ray images as an example application to demonstrate the IBM Visual Insights streamlined processes for image labeling, model training, and model deployment. With the recent availability of annotated X-ray image datasets, good progress has been made using convolutional neural networks (CNN) for medical diagnosis \cite{Abbas_2020}, \cite{Hassanien_2020}.The models can detect the prominent pneumonia pattern of chest scans as a key COVID-19 indicator, but models applied in these previous studies involve some advanced algorithms, such as transfer learning from other generic object recognition tasks, which makes them less intuitive to deploy for subject matter experts with limited coding and DL skills. In the following, we show how IBM Visual Insights helps train an advanced model relatively easily, allowing domain experts to easily manage data and train models using a streamlined interface.
Importing the Dataset
The dataset used in this example is from a Github repository publicly released by Skytells \cite{cohen2020covid}. Figure \ref{248975} shows example scans from four categories of X-ray images. This dataset contains 860 normal, 60 COVID-19, 650 bacteria pneumonia, and 412 viral pneumonia images. All images are the same size (400x300 pixels) and are stored in JPEG format. The dataset is imbalanced in the sense that the number of COVID-19 images is far less than images in other three categories. A good supplement is another dataset \cite{cohen2020covidProspective} that contains 660 COVID-19 and other viral and bacterial pneumonia cases scraped from the web.