Introduction
Misophonia, literally ‘hatred of sounds’, is a disorder characterized by strong negative behavioral and physiological emotional responses to certain specific sounds which we come across frequently in our day-to-day life (Jastreboff and Jastreboff, 2001; Kumar et al., 2014; Brout et al., 2018; Swedo et al., 2022). In typical cases of misophonia, these ‘trigger sounds’ tend to be sounds of eating, drinking, breathing, and chewing produced by other people. The emotional response to trigger sounds includes feelings of anger, irritation, anxiety, and disgust accompanied by a strong urge to escape from the situation in which trigger sounds are produced. Since trigger sounds are common and almost inescapable in social situations, misophonia has debilitating effects on occupational, family, and social life. A person with misophonia, for example, might not have a meal or share a common space with other family members, avoid using public transport for travel, and will tend to avoid social situations at their workplace. This social isolation, particularly in severe cases of misophonia, has a high impact on mental health, and cases of suicide or suicide attempts have been reported in the academic literature (Siepsiak et al., 2022) and media (Nauman, 2017), with up to 20% of people with misophonia indicating they have had thoughts of suicide (Rouw and Erfanian, 2018). Although the exact prevalence of misophonia is not known due to a lack of any comprehensive epidemiological study, a few studies (Wu et al., 2014; Zhou et al., 2017; Naylor et al., 2021) targeting specific populations (i.e., students) have reported a prevalence of anywhere between 5 and 20%.
The mechanism of misophonia remains largely unknown. A dominant paradigm to understand misophonia currently is the ‘auditory processing framework’, in which misophonia is considered a disorder of sound processing (Jastreboff and Jastreboff, 2001; Jastreboff and Jastreboff, 2015). In this framework, trigger sounds cause aberrant processing in the auditory/emotion processing parts of the brain which subsequently drives the emotional response in those with misophonia (via a currently unknown mechanism). The fact that many typical trigger sounds are generated by other people, is completely ignored in the auditory framework, as is the fact that the context in which trigger sounds are produced is largely social, and that people with misophonia can be triggered by particular individuals (e.g., close family members or someone familiar) but not by others. In other words, the current auditory-focussed model of misophonia employs a ‘detached’ framework where sounds are isolated from the social context in which they typically occur. There is now overwhelming evidence that the brain processes social signals in a different way to non-social signals (for reviews see (Adolphs, 2010; Molapour et al., 2021)). For example, brain imaging in social cognitive neuroscience has shown that specialized cognitive processes implemented in dedicated brain structures are employed to extract socially relevant information such as faces (Kanwisher et al., 1997), voices (Belin et al., 2000), the body as a whole (de Gelder et al., 2010), and other ‘higher order’ social information such as mental states of other people (Frith and Frith, 2006). In other words, these brain structures become active only in response to sensory stimulation in social situations and they work differently or are ‘silent’ when dealing with non-social sensory input. This has important implications for misophonia: emphasis only on the auditory/sound dimension while completely ignoring or delinking the ‘social’ aspect of misophonia may hinder progress in understanding the perceptual/cognitive processes and underlying brain mechanisms.
Using an integrated ‘framework’, where both trigger sounds and their social source are considered together, a different picture of misophonia emerges. Since the trigger sounds, for example someone eating/chewing, are associated with an action (e.g., orofacial movement) of another person, it could be the case that misophonic distress is due to the perceived action of others and not due to the sound per se, which is a by-product of that action. In social cognition and neuroscience, it is well known that mere observation or hearing the sounds of actions of other leads to ‘mirroring’ or ‘mimicking’ of the same actions by the perceiver without any intention or awareness to do so (Chartrand and van Baaren, 2009; Heyes, 2011; Chartrand and Lakin, 2013). The mechanism behind mimicry is commonly understood within the framework of a ‘perception-action’ link, which posits that perceiving the action of others automatically activates representations of that action in the perceiver which, in turn, executes movements that are congruent to the perceived actions. With respect to brain function, the ‘perception-action’ link is instantiated as communication between sensory areas and the motor areas of the brain. With emphasis on action of the trigger-person, could it be that the perception-action link is relatively stronger in misophonia which is activated by the sight or sounds of action? Initial evidence for activation of the perception-action link and ‘mirroring’ of actions in misophonia was provided by a recent study (Kumar et al., 2021) from our group. The study, using functional magnetic resonance imaging (fMRI), demonstrated that in the resting state, when no explicit stimuli are presented, people with misophonia, who had eating/chewing sounds as their dominant triggers, show stronger connectivity (compared to control subjects) of auditory and visual cortex to a part of the pre-motor cortex involved in the movement of orofacial muscles (movement of mouth, lip, tongue etc). An implication of the stronger resting state connectivity is that the orofacial motor cortex may be ‘primed’ to respond strongly to auditory and visual stimulation arising from the (orofacial) actions of others. This was supported by the study showing that activation of the orofacial motor cortex was stronger, specifically for trigger sounds (mostly orofacial in nature) in misophonia.
One implication of the evidence from our neuroimaging data is that mimicry should be widely present in the misophonia population. With respect to misophonia, there are anecdotal reports of people mimicking the actions of the trigger person, but the effect has not attracted much attention within misophonia research, except for a couple of case reports. In a report of two misophonia subjects (Hadjipavlou et al., 2008), one subject who had eating /chewing sounds as triggers had urges to mimic the sounds by moving their lips and mouth. In another study, Edelstein et al. (2013) reported 6 out of 11 subjects (55%) having a tendency to mimic the sounds. Within the ‘auditory framework’ mimicry is difficult to explain, whereas the social/action framework suggests mimicry would be present in people with misophonia.
To address the relation between misophonia and mimicry, we asked more than 600 participants to complete online questionnaires relating to their misophonia severity and their tendency to mimic the sounds/action of the trigger person. More specifically, we explored the following: (i) with increasing misophonia severity, how likely people with misophonia are to mimic, (ii) whether the act of mimicking relates to a particular set of trigger sounds (i.e., social), and (iii) what effect mimicking has on individuals with misophonia. Our data suggest that the tendency to mimic is associated with misophonia severity and that the act of mimicking provides relief. The current data, along with results from our neuroimaging study, provide support for a social perception/cognition-based model of misophonia.