CDSS as clinical reasoning support
systems
Above, we have argued that coming up with a diagnosis and treatment plan
involves a search process (exploration and investigation) that is
directed by clinical experts. Specific to the reasoning of clinical
experts in this search process is, for example, to ask relevant and
sensible questions about the case, to decide which parameters (clinical
data and other) about a patient are relevant to include and which not,
to formulate possible explanations for the symptoms, and to see
similarities with other cases. In this epistemological context, CDSS
must support this process by answering questions asked by the clinician.
For example:
1. What are likely diagnoses for a patient
with symptoms x,y,z?
2. What treatments have been found effective for patients with diagnosis
A, from age group B, with comorbidities C,D and E?
3. What are the chances that a patient with symptoms x,y,z has disease
A? Or disease B?
4. How likely is it that treatment T will be effective for a patient
with symptoms x,y,z?
5. If the patient with symptoms x,y,z has disease D, what other signs of
symptoms would they have?
6. What if, instead of symptom x, the patient would have symptom w?
In addition, CDSS could also be helpful in effectively searching the
patient’s medical records, for example to answer questions such as:
7. How often has the patient suffered from similar attacks?
8. What other drugs does the patient take, and might they interact?
9. What other examinations have been performed on this patient, and what
was the outcome?
In short, the CDSS can provide information on the patient’s records and
statistical (numerical) information about illnesses and treatments in
similar cases, and with that support all types of reasoning (deductive,
inferential, hypothetical, counterfactual, analogical, etc.) employed by
clinicians about their patients. Moreover, based on the data of a
patient that is fed to the CDSS, the system could come up with
suggestions itself (hypotheses). But still it is the clinical expert’s
epistemic task to: 1) come up with relevant questions and 2) judge the
answers. Concerning the latter, the criteria employed by a CDSS to
evaluate the answers are different from the criteria employed by the
clinician. Whereas the CDSS uses a very limited set of epistemic
criteria (such as technical and statistical accuracy, cf. Kelly et al.
2019), a clinician’s judgement must meet a more extensive set of both
epistemic criteria (such as adequacy, plausibility, coherency,
intelligibility) and pragmatic criteria to assess the relevance and
usefulness of the knowledge for the specific situation.
In short, we have argued that clinical decision-making is a complex and
sophisticated reasoning process, and that a clinician is
epistemologically responsible for this process. Instead of thinking of
CDSS as a system that answers the question “what is the diagnosis for
patient A with symptoms x,y,z” and, subsequently “what is the best
treatment for this patient”, it is better to think of the system as
answering the numerous intermediate questions raised by a clinician in
the clinical reasoning process. By answering these questions with the
help of statistical information based on a large amount of reliable
data, the clinician’s reasoning process can be supported, substantiated
and refined. Therefore, we propose that it is more suitable to think of
CDSS as clinical reasoning support systems (CRSS). In the
following paragraphs, we will further elaborate on what is needed for
good use of a CRSS in clinical practice.
We will defend that the designer
of the system and the clinicians who will use it, already need to
collaborate from early on in the development of the CRSS.
The epistemological role of experts in developing
CRSS
Above, we explained that the epistemological role of clinicians in the
diagnosis and treatment of individual patients is crucial, even though
CRSS can provide important support. Here we will explain that the
epistemological role of clinical and AI experts is also crucial in the
development of a CRSS, and that these experts need to collaborate.
In a very simple schema, the development of a CRSS consists of three
phases, the input, throughput and output. Human intelligence plays a
crucial role in each phase.
The input in the development of a CRSS is existing medical
knowledge (for knowledge-based AI-systems) and available data (for
data-driven systems). In the development of knowledge-based CRSS
all clinical, epidemiological and theoretical knowledge in the medical
literature can be used. However, medical experts must indicate which
knowledge is relevant for which purpose, and which knowledge belongs
together, and also how reliable that knowledge is. In the development ofdata-driven CRSS, reliably labelled data are needed to train the
system, while relevant, reliable unlabelled data are need for the system
to find patterns and correlations. Knowledge from clinical experts is
needed to generate the training set (such as labelled images), and to
select sets relevant and reliable unlabelled data. In all these cases,
knowledge of clinical experts plays a role in choosing appropriate
categorizations, adequate labelling, and in the organization of data
storage in order to make the system searchable and expandable for
clinical practice.25
The throughput in the development of a CRSS is the
machine-learning process in which the machine-learning algorithm
searches for a ’model’ (i.e., another algorithm) that connects the
labelled data in the training set in a statistically correct way (i.e.,
supervised learning), or detects statistically relevant correlations in
unlabelled data (i.e., unsupervised learning). The design, development
and implementation of this machine-learning process requires AI experts
rather than clinical experts. However, there will be overlap between the
development of the input (the labelled or unlabelled data fed into the
process) and the machine learning process, which implies that some
collaboration is necessary in this phase.
The output (or result) of the mentioned steps in the development of a
CRSS is a ‘model’ (an algorithm). This model is implemented in the CRSS
to be used in clinical practice. But before implementation, the model
must be checked by human experts for relevance and correctness, since
its statistical correctness does not automatically mean that it is
adequate and relevant for the CRSS.3 11For
example, Kelly et al. (2019) describe a study in which “an algorithm
was more likely to classify a skin lesion as malignant if an image had
a ruler in it because the presence of a ruler correlated with an
increased likelihood of a cancerous lesions” (ibid, 4). This is
because the data is under-determined, which means that in principle many
statically correct models (algorithms) can be found (cf. McAllister
201126) to (i) connect between labelled data and their
labels (in the case of supervised learning), or (ii) find statistically
relevant correlations in unlabelled data (in the case of unsupervised
learning). In order to be able to do this, clinical experts must, for
example, know which parameters play a role in the model and then assess
on the basis of their medical expertise whether this is
medically/biologically/physically plausible. In short, here as well the
contribution of human intelligence is crucial, since medical experts, in
collaboration with AI experts must determine whether the resulting model
is reliable and relevant.
Explainable and accountable CRSS to facilitate interaction
with the
clinician
To use a CRSS as a clinical reasoning support system in the manner we
suggest above, it is necessary that a CRSS facilitates this. This
requires22Another requirement is that a CRSS is equipped with a
suitable interface that allows clinicians to enter their questions,
possibly even by speaking. And the algorithm should be designed such
that it can deal with various questions posed by clinicians. This kind
of flexibility might be challenging to implement, it goes beyond the
scope of this paper to address these challenges. that a CRSS should
facilitate that a clinician can evaluate its answer and judge its
accuracy and relevance for the specific patient. A well-known objection
to AI for clinical practice is the opacity of the algorithm: how it
establishes an outcome based on the input is ‘black-boxed’. This, of
course, obscures the users’ ability to judge the accuracy and relevance
of the outcome. Chin-Yee and Upshur (2019), for example, argue that
because of the black-box nature of CRSS, using these systems conflicts
with clinicians’ ethical and epistemic obligation to the patient.
According to them, this is one of central philosophical challenges
confronting big data and machine learning in
medicine.27
Similarly, in their ‘Barcelona declaration for the proper development
and usage of artificial intelligence in Europe’ Sloane and Silva (2020)
argue that decisions made by machine learning AI are often opaque due to
the black box nature of the patterns derived by these techniques. This
can lead to unacceptable bias.9 Therefore, they state
that “When an AI system makes a decision, humans affected by these
decisions should be able to get an explanation why the decision is made
in terms of language they can understand and they should be able to
challenge the decision with reasoned arguments” (ibid, 489).
These requirements for the use of AI systems are indicated by the
developers of machine learning developers by the concept ofexplainable AI. The idea of explainable AI is that humans can
understand how a CSRS has produced an outcome, for example by developing
algorithms that are understandable by the users. This, however, might
limit the level of complexity of the algorithm, and with that negate the
possible benefits of using AI. In case of clinical use it might not be
necessary to understand the exact intricacies of the algorithm, but
rather to have some insight into factors that are important or decisive
to come up with a specific prediction or advice. What machine learning
algorithms do is learn to assign weights to features in the data, in
order to make optimal predictions based on that data. For clinicians, it
is important to know which features are considered relevant by the
algorithm and how much weight is assigned to this feature. Having that
information, a clinician can judge whether the features that a CRSS
picks out are indeed relevant or not (i.e. an artefact in an image, or
an unreliable measurement). In the optimal configuration, a clinician
can also enter feedback into the system, allowing the algorithm to come
up with an alternative prediction, and to learn for future cases.
An advantage of using an explainable AI algorithm, assuming that
CRSS should be considered as a clinical reasoning support systemrather than a decision system, is that it aids clinicians to
explicate their reasoning process. Important in this context is that
medical expertise involves a lot of tacit knowledge that can easily
remain hidden in the clinical reasoning of these experts. We have argued
that epistemological responsibility entails elucidating knowledge and
reasoning that otherwise remains implicit.14 However,
for clinicians this can be quite challenging. Using a system that
formalises aspects of the reasoning process and explicates the factors
that are combined, and with what weight, will support clinicians in
developing their ability to articulate and justify their own reasoning
process. This explicit understanding, in turn, can contribute to the
communication between the clinician and the patient. The explanation
enables patients to understand their clinician’s reasoning process and
add to it, thus empowering them to take part in the decision-making
process concerning their own medical care.
Establishing a link between the CRSS and the individual
patient
Sullivan (2020) argues that it is not necessarily the complexity or
black-box nature that limits how much understanding a machine learning
algorithm can provide.28 If an algorithm is to aid
understanding of the target phenomenon by its user (such as a scientist
or a clinician) it is more important to establish how key features of
the algorithm map onto features of the real-world phenomenon. This is
called empirical justification . Sullivan calls a lack of this
type of justification link uncertainty . Link uncertainty can be
reduced by collecting evidence that supports the connection between
“the causes or dependencies that the model uncover to those causes or
dependencies operating in the target phenomenon” (ibid, 6).
Consider, for example, an algorithm that is used to classify cases of
skin melanoma29 (Esteva et al. 2017, as referred to by
Sullivan), which is developed by a machine learning algorithm using
large amounts of images from healthy moles and melanoma. Because there
is extensive background knowledge linking the appearance of moles to
instances of melanoma, for example explaining why possible interventions
are effective for lesions that look a certain way, “the model can help
physicians gain understanding about why certain medical interventions
are relevant, and using the model can help explain medical interventions
to patients” (ibid, 23). This background knowledge links the mechanisms
that are uncovered by the AI algorithm (i.e. predicting which treatments
will be effective for which cases) to relevant mechanisms in the target
phenomenon (i.e. skin lesion that does or doesn’t require treatment).
Because of this link, empirical justification is established, and
clinicians can use the algorithm to answer why-questions about skin
lesions.
Concerning the transparency of algorithms, Sullivan contends that our
understanding is quite limited if we know nothing whatsoever about the
algorithms. She argues that having some insight in the weighing used by
the algorithm is needed. Therefore, as long as the model is not opaque
at the highest level, that is to say that there is some understanding of
how the system is able to identify patterns within the data, it is
possible to use a complex algorithm for understanding. What is needed is
“some indication that the model is picking out the realdifference makers (i.e., factors that matter) for identifying a
given disease and not proxies, general rules of thumb, or artefacts
within a particular dataset” (ibid, 21).
In our view, Sullivan identifies an important condition for the use of
CRSS in clinical practice. Based on her analysis, we infer that it is
important to ensure that the algorithm used by a CRSS (which was
developed by data-driven AI) is linked to the target phenomenon, by
empirical (preferably scientifically supported) evidence. Sullivan has
more general links in mind: that the algorithm can generally be used to
understand the mechanisms of a target phenomenon. For clinical practice
we would add another important link: a link between the algorithm and
the individual patient that the clinician intends to diagnose and treat.
To establish this link and use a CRSS to better understand the
individual patient, clinicians need to ensure/verify that 1) the type of
outcome (i.e. the disease category) produced by the CRSS is consistent
with the ‘picture’ of the patient that the clinician has constructed so
far; 2) the data used to train the CRSS is relevant to the patient; and
3) that the input required by the CRSS is available to the patient in
question and of good quality.