Mirror neurons and canonical neurons are two classes of visuomotor neurons that are
activated by different visual stimuli (Rizzolatti and Kalaska, 2012). Mirror neurons
respond to a biological effector interacting with an object (Gallese et al., 1996),
suggesting their role in action recognition, while canonical neurons respond to the
presentation of a graspable object (Murata et al., 1997), and are considered crucial
in visuomotor transformation for grasping (Jeannerod, 1995).
In their interesting and thought-provoking “affordance-matching hypothesis” Bach et
al. (2014) argue that both types of neurons contribute to action understanding. Action
hypotheses are posited to be created by means of object affordances. Affordances are
motor possibilities an object offers (Gibson, 1979). The visual description of an
object's intrinsic features are associated with possible motor acts toward that object.
A possible neural implementation for this mechanism are canonical neurons. The thus
generated action hypothesis based on an object affordance would then be confirmed
by the mirror neuron system. When a match between a predicted action (canonical) and
an actually observed action (mirror neurons) is confirmed, either the action goal
can be predicted based on observed behavior, or behavior can be predicted based on
observed goals (see their Figure 1).
We believe, however, that the proposed separation of hypothesis generation and hypothesis
matching is not in line with the empirical evidence currently available, and that
the division between “interpretation” and “prediction” relies on a cognitivist assumption
that is hard to defend. We suggest that enactivist approaches provide a less problematic
framework for studying action understanding.
Bach and colleagues are not entirely explicit about the nature of the proposed matching
mechanism between affordance and observed action, but we see two options for the proposed
division of labor. In the first and admittedly unlikely option, mirror neurons play
the role of a quizmaster that knows the answers. If the right hypothesis is posited,
all the mirror neuron system has to do is confirm it. In this case, the contribution
of the affordances is superfluous, as mirror neurons already extracted all that is
needed from the perception of an action, (i.e., the quizmaster knows the answer).
Counter evidence for this option exists in the form of mirror neurons that fire in
the absence of an affordance to be matched. The auditory mirror neurons reported by
Kohler et al. (2002) fire upon the presentation of the sound of an action alone (peanut
breaking, paper tearing) without there being an affordance to match, or a prediction
to confirm.
But more importantly, virtually all mirror neuron studies (except Bonini et al., 2014a
and Caggiano et al., 2009) involved actions performed in the extrapersonal space—out
of reach for the monkey. Canonical neurons remain generally silent when an object
is in extrapersonal space of the monkey, suggesting a mainly pragmatic (i.e., in terms
of possibilities to interact with the object), rather than a metric reference frame
(i.e., in terms of physical distance between the object and the observer; Maranesi
et al., 2014). This means that the bulk of mirror neuron study reports mirror neuron
firing in absence of canonical neuron firing. This, in turn, means that the major
part of mirror neuron activity cannot rightfully be framed as “affordance matching,”
at least not when canonical neurons are assumed to provide the affordances.
The second and more likely option is that affordance extraction and mirror neuron
firing jointly contribute to action understanding by each generating a hypothesis;
one based on the object, consisting of one or more actions the object affords, and
one about the action the actor is possibly performing (“action classification”; Uithol
et al., 2011). When two hypotheses match, they are combined and the action is recognized.
However, this means that mirror neuron input is not dependent on the availability
of a to-be-matched affordance (i.e., mirror neuron activity is expected without affordances
available), which is in line with the empirical evidence as highlighted above, but
not predicted by the affordance-matching hypothesis. And also here the fact that canonical
neurons fire upon object presentation only in monkey's peripersonal space would mean
that canonical neuron-based affordances can only be matched within the monkey's peripersonal
space. The only neurons showing canonical properties that could be activated by objects
in the extrapersonal space are a recently discovered class of neurons reported by
Bonini et al. (2014a). These neurons were dubbed “canonical-mirror neurons” as they
show both canonical and mirror properties at the single neuron level. However, the
canonical-mirror response to object presentation in the extra-personal space cannot
be considered a neural implementation of an affordance, as these neurons do not fire
for the same objects in the peripersonal space. Rather, these neurons seem to be involved
in an object-triggered action prediction (Bonini et al., 2014a), which is indeed in
line with the affordance-matching hypothesis, but emphatically does not generalize
to canonical and mirror neurons in general. Additionally, recent findings (Bonini
et al., 2014b) revealed that some mirror neurons, besides discharging during action
observation, are also active when an action is not performed by an actor. This activation
can obviously not be interpreted as a match between object affordances and action
kinematics, as the latter are absent.
As a solution, one might detach the hypothesis generation and confirmation processes
from canonical and mirror neurons; the principle of affordance matching is after all
not committed to these classes of neurons. But then we wonder what evidence remains
for framing action understanding as “hypothesis generation and testing.” Why is there
the need to combine the (in this case two) types of information into a unified representation?
We believe that this framing of action understanding as drawing unified and coherent
conclusions about observed actions may have been guided by the (cognitivist) assumption
that cognition is centered around retrieving information. Alternatively, the framework
of enactivism (Varela et al., 1991; Hutto, 2013; Hutto and Myin, 2013) seems to be
much more in line with the complexity in action understanding. Enactivism assumes
that cognition is not for creating representations about external events, but interacting
with the world. In this framework, action understanding can take many guises of which
many are best understood as a form of pattern completion: The observer is faced with
an incomplete percept of an action, which is then completed based on perceptual mechanisms,
mirror mechanisms and even higher associations—e.g., actors-object associations (see
Uithol and Paulus, 2013). Importantly, there is no need to combine the different routes
into a unified representation of the observed action or inferred action goal. If both
object and action information are available, perhaps the classification or prediction
process is faster, easier and better, but the current evidence suggest that unifying
the types of information into a single match is not necessary.
If action understanding is no longer framed as forming a conclusion about an observed
action, but instead in terms of pluriform pattern completion that do not mount (always)
to a unified representation, another assumption of the affordance matching hypothesis
disappears as well: the difference between interpretation and prediction. Both interpretation
(“classification” in our terminology) and prediction involve completing a pattern
based on an incomplete percept. This means that the information flow cannot be segmented
in “interpretation,” “knowledge,” and “prediction.” Interpretation is not a process
upstream of knowledge, and prediction is not a process downstream from it, nor do
they represent information flows in opposite directions; both notions refer to the
process of sensorimotor action specification.
In all, we believe that the suggestion of the affordance-matching hypothesis that
different sources of information can each contribute to action understanding is an
important one that could open doors to new lines of research. However, the current
evidence does not support the proposed division between hypothesis-generation and
hypothesis testing.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial
or financial relationships that could be construed as a potential conflict of interest.