The Backdoor Criterion: Covariate Selection for Causal Inference
As a contrast from model selection approaches, a causal inference
methodology that has recently emerged in ecology is Judea Pearl’s
structural causal model (SCM; Pearl 2009). This framework uses DAGs to
visualize researchers’ assumptions about the causal structure of a
system or process under study. Once a DAG has been created, a graphical
rule known as the backdoor criterion can be applied to determine the
covariates required to answer a causal question from observational data.
Conceptually, the backdoor criterion instructs us to block all
non-causal paths between a predictor and response variable of interest,
while leaving all causal pathways open. Graphically, this translates to
blocking all backdoor paths between a predictor and response variable.
Backdoor paths are sequences of nodes and arrows with an arrow pointing
into both the predictor and response variable of interest; if left open,
they can lead to non-causal associations between variables of interest.
To block a backdoor path, we can either (1) adjust for an intermediate
arrow-emitting variable or (2) not adjust for a variable with two
incoming arrows (i.e., a collider variable: X ).
For example, given our DAG in Fig 1, to determine the total effect of
forestry on species Y, there are four backdoor paths that must be
blocked:
- Species Y Climate Forestry
- Species Y Climate Fire Species A Species Y
- Species Y Species A Fire Climate Forestry
- Species Y Human Gravity Forestry
The first three backdoor paths can each be blocked by adjusting for the
intermediate arrow-emitting variable climate. The fourth backdoor path
can be blocked by adjusting for the intermediate arrow-emitting variable
human gravity. Therefore, to determine the total effect of forestry on
species Y, we must adjust for climate and forestry. Following covariate
selection, researchers can determine the appropriate statistical
analysis, given their data. It is important to note that DAGs and the
backdoor criterion are compatible with both linear and non-parametric
approaches (Pearl 2009; Elwert 2013). As our simulated data was created
using linear relationships, we have chosen a linear regression model,
setting species Y as our response, forestry as our predictor, and
including climate and forestry as controls. This model returned an
accurate total causal estimate of -0.75[-0.77, -0.73] (Appendix S1).
The application of the backdoor criterion can become increasingly
complex with larger DAGs and as such, tools such as ‘dagitty’
(www.dagitty.net; instructions within
site) can help in composing DAGs and specifying causal questions, which
will subsequently identify required backdoor adjustment sets.