Automated selection of interest scenes
Each video segment, for which both hornet(s) and honey bee(s) were
detected, using the process just described, was automatically extracted
using specifically developed software. This software included a
step-by-step procedure composed of the following processes: (i )
stereovision acquisition, (ii ) target detection, in each image
independently, on RGB-D, (iii ) temporal aggregation for
multi-target tracking in 3D (Chiron et al., 2013), (iv ) signature
extraction from the individual trajectories, (v ) hierarchical
segmentation of the trajectory data into temporal entity, and
(vi ) behavioural modelling by multi-level clustering (Chiron et
al., 2014). The video segments were then visually reviewed by an
observer in order to detect potential successful predation of a honey
bee by a hornet (Supporting Information, Figure S1 ). We
considered a predation to be successful when a hornet caught a honey bee
and flew out of video view with its caught prey, taking into account the
limited field of view (about 1.5m2 around the beehive
entrance). Each video was reviewed twice by the observer to confirm the
successful predation events. A predation was considered as a failure
when observing both hornet(s) and honey bee(s) in the same scene but
with no predation success (e.g. no catch).