To accomplish prostate lesion detection, pixel-level labels are manually
annotated prior to model training. Prostate lesions are commonly marked
with point label only, as fine delineating lesion region (e.g., the
pixel-level label) is tedious. However, the point label, typically
perceived as a weak label, is insufficient to represent the lesion area
for segmentation model training. The lesion area not marked with the
point label is probably categorized as negative (health tissue) pixel
samples. Thus, we strengthen the existing “weak” point labels by
aggregating its neighbor pixels into a region, providing promising cues
for lesion detection. Kiraly, Abi Nader, Tuysuzoglu, Grimm, Kiefer,
El-Zehiry and Kamen [27] expanded the single
marked pixel to a small-diameter circle using Gaussian kernels. However,
such a processing method focuses on lesion localization rather than
contour approximation. Therefore, we apply a more sophisticated weak
label processing method, i.e. distance regularized level set evolution[42], to automatically generate the coarse mask
label (in Figure 4 ). This level set method is an edge-based
active contour approach. The label can be produced in three steps: 1)
Initialize a level set function to represent the lesion contour
originated from a manually marked point; 2) Expand the lesion contour
outward and update the level set function; 3) Terminate the expansion
and finalize the function once exceeding the pre-defined iteration
steps. As a result, the coarse mask
label
can be generated without labor-intensive annotation on the lesion
region.
Figure 4 illustrates the network architecture of the proposed
CMD²A-Net. The coarse segmentation module outputs coarse lesion contour
and also enables local feature extraction on lesion regions. Provided
with more lesion features, the domain transfer module is introduced to
facilitate feature alignment. A classifier module is incorporated for
malignancy prediction. CMD²A-Net is trained on the three sequences (i.e.
T2, ADC, and hDWI) individually. Based on the model output (i.e. lesions
malignancy probability) of the three sequences, we can obtain the final
malignancy predictions using ensemble learning. CMD²A-Net has two
parallel branches with respect to (w.r.t.) the source and target domain,
where two encoders extract features of prostate MR images separately in
the two domains. The segmentors from the two domains share the same
weights. The source segmentor is optimized by a supervised loss (i.e.
coarse lesion segmentation loss). Samples and coarse mask labels from
the source domain are required for training. The segmentation loss can
be defined as
, (1)
where and indicate the pixel element values of mask label and predicted
lesion map , respectively. Indices and denote the column and row of the
image matrix in a dimension of w ×h . Constant value, (set
to 10-5), is applied to avoid the zero-denominator
case, as well as to guarantee numerical stability.
4.2. Attention-based Malignancy Estimation
In recent studies of prostate lesion classification (e.g., Guan, Liu,
Yang, Yap, Shen and Liu [28]), lesion
identification was suggested to be highly associated with
disease-related regions in MR images. Instead of treating all pixels in
the entire MR slice equally, an attention mechanism can be introduced to
specifically extract lesion features. With these insights, we
hypothesize that incorporating the prior knowledge of lesion regions
into the DA process could enhance the model’s classification
performance. As illustrated in Figure 4 , the two branches
follow the same pipeline to generate attention feature maps. In each
branch, the attention map can be produced using the prostate region and
the coarse lesion mask, enabling our model to focus on the lesion region
and also extract more lesion representations. The prostate region and
the coarse lesion mask are denoted as and , respectively. Note that the
subscripts “s ” and “t ” of variables (e.g., and ) inFigure 4 represent the source and target domains, respectively.
The attention maps of source and target domains, and , respectively, can
be calculated by:
(2)
where the operation means the element-wise product, and the sigmoid
function is denoted by which is adopted as the nonlinear activation to
generate attention maps. Such a simple but effective function can
constrain each element of the feature maps in [0,1], thus weighting
the importance of regions. As a result, guided by coarse mask labels,
the lesion areas would be assigned higher weights than the
non-informative background (i.e. healthy tissue) in the feature maps.
To achieve accurate lesion classification, features from the lesion
attention maps can be extracted by an encoder, such that high-level
lesion features can be captured for the classifier module. Thus, in each
branch, an encoder is incorporated after the segmentor to extract each
domain’s specific features. Besides, we propose to fuse the lesion
features and the prostate features to boost the classification accuracy.
Skip connection and concatenation operations are introduced to reuse
prostate features from the segmentors.
We design a domain transfer module (in Figure 4 ) without
requiring target labels in the training process. The semantics features
from both the prostate region and attention map are fused, such that
deep coral features from fully connected (FC) layers can be captured for
feature affinity. Deep Coral loss [25] is employed
to minimize cross-domain feature distribution discrepancy, owing to its
generality, transferability, and ease of implementation. It is defined
as the difference of second-order covariances between domains. Our
domain transfer loss is defined as:
, (3)
where indicates the number of FC layers. Constants , are the weights
that balance the contribution of FC layers, which are set to 1 here. The
squared matrix Frobenius norm is denoted as . The dimension of the FC
layer is indicated by . The feature covariance matrices of source and
target domains, and , respectively, can be calculated by:
(4)
where denotes the number of images in the corresponding domain, and
indicates the feature matrices of the corresponding FC layer, and1 is a column vector with all elements as 1.
To accomplish malignancy prediction using mpMRI, an ensemble learning
approach is employed to fuse the predictions of the three separated
models (w.r.t T2, ADC, and hDWI). We train the classifier module, as inFigure 4 , using labeled source data. The FC layers in the
source domain are employed, not only for cross-domain feature affinity,
but also for malignancy classification. The cross-entropy loss is
utilized to optimize the classifier module. Our classification loss can
be defined as:
, (5)
where variables and denote the ground truth and the malignancy
prediction w.r.t. each source sample, respectively.
The ultimate purpose of CMD²A-Net is to accomplish accurate PLDC. To
this end, we simultaneously train the coarse segmentation module, domain
transfer module, and classifier module. Note that, minimizing
segmentation loss alone would cause overfitting to the source domain, and
only optimizing domain transfer loss would lead to generalization
degradation in the target domain. Therefore, joint optimization on the
total loss could facilitate the training process to reach equilibrium,
such that the domain-invariant features could be extracted to achieve
accurate classification. The total loss can be defined to:
, (6)
where and are weighting hyperparameters of the total loss. Both of them
were set to 0.5 in our experiments.
To leverage the benefits of multiple sequences, we utilize the weighted
average ensemble learning-based
method. The outputs of the three separated models are incorporated, thus
contributing to the final ensemble prediction as follows:
. (7)
where , , and are the malignancy probability predictions of T2, ADC, and
hDWI, for which the weights are 1, and , respectively. Binary variables
are assigned based on the availability of ADC and hDWI. For example, if
the samples include ADC but without hDWI, and .
4.3. Implementation Details
Our models (i.e. Mask-RCNN model, CM-Net, and CMD²A-Net) were trained
using a GeForce GTX 1080 Ti GPU (Nvidia, California, USA) with API Keras[43]. For the Mask-RCNN model training, data
augmentation with random rotation was applied on the 646 T2 image slices
on I2CVB. All the slices were split into training, validation, and
testing sets in the ratio of 7:2:1. The input shape of Mask R-CNN was
set to 512 × 512 pixels. Adam optimizer was applied with a learning rate
of 10-3. The batch size was set to 4 and the total
epoch was 200. During the training process, the model with the highest
dice coefficient score on the validation set was retained. For CM-Net
and CMD²A-Net training, the prostate regions from P-x, LC-A, and LC-B
were scaled to 224 × 224 pixels. Random rotation of {±3°, ±6°, ±9°,
±12°, ±15°} was applied for data augmentation. Adam optimizer was
chosen, and its learning rate was set to 10-5. The
batch size was set as 2. In the training process of CM-Net, due to the
limited sample size, all the slices were split into training and testing
sets in the ratio of 4:1 using the hold-out method. The segmentation
loss was optimized first to accelerate model convergence, and CM-Net
with the pre-trained coarse segmentation module was further trained. In
terms of CMD²A-Net, we initialized its both branches first using the
weight of pre-trained CM-Net, in order to facilitate its convergence. To
be specific, we trained both the coarse segmentation module and
classifier of CM-Net first, with the combined samples from both domains.
Then, we optimized the total loss of CMD²A-Net with labeled source
samples and unlabeled target samples. By co-training all the modules,
the model with the highest accuracy was saved for malignancy evaluation
in the target domain.
We also offer our executable codes and files online available via
GitHub, so as to allow any work extension or application by others. This
open-sourced deep-learning-based model acts as an end-to-end system,
input from prostate mpMRI sequences (i.e. T2, ADC, and hDWI), output to
prediction results (i.e. prostate segmentation, coarse lesion detection,
and malignancy estimation). The system supports multi-format inputs,
including DICOM, jpeg, png, and jpg files. It is emphasized that no
manual prostate segmentation or annotation is required.
Supplementary Figure 1