Nonhuman Primate Alarm Calls Then and Now

Against the background of the seminal papers on the vervet monkey alarm call system by Seyfarth, Cheney and Marler (1980a. 1980b), I provide an overview of context specificity in calling and call comprehension learning in the genus Chlorocebus, and to which degree these findings inform the reconstruction of the evolution of speech and language. The alarm calls of vervet monkeys (Chlorocebus pygerythrus) and West African green monkeys (Chlorocebus sabaeus) are astoundingly similar in terms of the structure of calls given in response to their main predator types. Green monkeys also spontaneously produced calls that sounded like ‘eagle alarms’ in response to a novel aerial threat (a drone), supporting the view that both the link between the predator category and the call, as well as the call structure itself are largely hard-wired. In contrast, learning shapes the responses to sounds. As recent field experiments showed, monkeys are able to rapidly attach meaning to a novel sound, and they are also able to factor in contextual information when choosing their responses. These findings corroborate the view of a fundamental dichotomy in the degree of flexibility in terms of the production of calls with a specific structure vs. the comprehension of calls and suggest that the emergence of auditory learning abilities preceded the evolution of flexible vocal production. In summary, nonhuman primate communication shares fewer similarities with human speech than many scholars may have hoped for, but we would never have learned so much about primate vocal communication were it not for the breakthrough study by Seyfarth, Cheney, and Marler.

adult vervets. While infants gave eagle alarms to a wide range of birds as well as falling leaves, juveniles were more selective and called in response to different species of raptors, while adults called almost exclusively to the most dangerous raptor, the Martial eagle.
The study stirred the interest and excitement of researchers ranging from ethologist to linguists and philosophers of mind. It constituted what Gregory Radick termed "an ethological riposte" to the ape language projects, where apes were hauled into people's care and extensively trained to use sign language or other symbols (Radick, 2008, p. 11). Radick's book "The Simian Tongue" is a must read for anyone interested in the history of the "long debate about animal language" (the sub-title of the book). The motivation to understand the origins of language from an evolutionary perspective gave rise to an enormously productive research program on primate communication that continuous to flourish to this day. I still have the paper copy of Seyfarth et al.'s Science paper I made when I read it for my Master's thesis in the early 1990s. These modest three pages have not only shaped an entire scientific field, but also deeply affected my own scientific trajectory. In the following, I will take up a few of the key topics associated with this paper, namely (i) context specificity in calling, (ii) call comprehension and functional reference, (iii) what these findings tell us about the evolution of speech and language.

Context Specificity in Calling
Forty years ago, very little was known about the variation of mammal vocalizations. The production of spectrograms was laborious and time-consuming. With the advent of real-time spectrograms and semi-automated analyses of vocalizations in the mid to late 1980s, it became possible to analyze hundreds or thousands of calls within a reasonable time-frame (see Fischer et al., 2013, for a brief review). In the wake of the vervet alarm call paper, many researchers set out to identify context-specific variation in alarm calls, or the calls of their study species more generally. There is now ample evidence that the calls of many primate (and other mammal) species vary in relation to the context in which they are given (e.g., Crockford & Boesch, 2003;Gouzoules et al., 1998;Slocombe & Zuberbühler, 2005). Comparing calls given in two or three different contexts may lead to an overestimation of context specificity, however. For a true assessment of context specificity, it is necessary to assemble systematic tallies of all calls in all contexts (Meise et al., 2011).
We set out to determine how acoustically distinct the vervet alarm calls were, using the original recordings made by Tom Struhsaker, Dorothy Cheney, and Robert Seyfarth. Previous quantitative analyses had considered only calls by adult females that had been given in response to snakes and eagles (Owren & Bernacki, 1988). We used calls given in response to the three main predator categories, and also included calls given during within-and between-group aggression (Price et al., 2015). From these contexts, we used those calls that sounded similar to some of the calls given in response to predators. Thus, we considered "chutter" and "rraup"-like calls given in aggressive contexts, as well as "wrrs," also known as threat grunts. Screams were excluded. In addition, we compared "bark"-like calls recorded from South African male vervets that had been uttered in response to real and stuffed toy models of leopards, and during within and between-group encounters (Price et al., 2015).
Female alarm calls could be very well distinguished. Using a discriminant function analysis (DFA), 98.7% were correctly assigned to the respective context. When calls recorded during aggressive interactions were included, the average correct classification dropped to 71.4%. The misclassification mainly concerned calls given during aggression and the aerial predator context. In females, terrestrial alarms retained the highest classification results ( Table 1). The pattern of misclassification corresponds to the results of a cluster analysis, an approach that does not presume the membership to a specific context when grouping calls. The cluster analysis identified a distinct cluster comprising mainly the terrestrial alarms, and a set of three overlapping clusters revealing graded variation of multi-element calls (see Price et al., 2015 for details). Such multi-element calls were also observed during within-and between-group aggression.  Price et al., 2015). Male alarm calls could also be well distinguished: 93.2% were correctly assigned to the correct predator class. Yet, barks by male vervet monkeys given in response to leopards and during intergroup aggression showed substantial acoustic overlap, resulting in 74.9% correct classification (Price et al., 2015).
What do these findings tell us about the mechanisms underlying call production? On the one hand, similar sounding calls may reflect similar underlying motivational states; on the other hand, subjects may experience the different situations very differently, but may be constrained in the way that they are able to express themselves, such that different emotional states converge onto the same motor pattern generators. Similarities between calls produced in response to snakes or during intergroup encounters may also point to a shared function, namely to recruit others for group defense or in intragroup agonistic encounters. It is also conceivable that males tend to use barks in both alarm contexts and territorial defense because these calls provide information about a caller's stamina and male may benefit from advertising their quality across contexts . Advertising one's stamina may also be beneficial for predator deterrence ("perception advertisement"; Caro, 1995;Tilson & Norton, 1981).
Importantly, context specific variation in calls may also be a result of natural selection, when there is a high premium for unambiguous calling or when there are strong developmental or evolutionary constraints preventing greater variation . In itself, the production of context-specific calls is not sufficient for identifying 'precursors' to semantic communication as found in human language. Instead, it is necessary to identify the mechanisms that give rise to context-specific signaling, and to check whether these are similar to those characterizing speech. Among these, the ability to establish arbitrary relationships between the signifier and the signified is a key criterion. This ability rests on vocal production learning and the ability (and motivation) to develop a conventionalized communication system (Fischer, 2017).
To assess the potential for some form of conventionalized communication, we studied the degree of flexibility in the alarm call system in this genus. More specifically, we initiated a study of the alarm call system of West African green monkeys, Chlorocebus sabaeus. The last common ancestor of this species lived around 2.1 million years ago (Perelman et al., 2011; but see Warren et al., 2015). The green monkeys were initially presented with snake and leopard models, as well as a model of an eagle perched on a tree . The monkeys responded with alarm calls, vigilance, and escape responses to the snake and leopard models, while largely ignoring the eagle model . Moreover, we had not observed them to give alarm calls to aerial predators either before the study (in 2019, we witnessed such calling for the first time). We therefore decided to present a novel aerial threat to the monkeys and flew a drone over them to assess their vocal responses (Wegdell et al., 2019).
In response to the drone, the animals produced clearly discernible calls, and a number of subjects ran into cover (Wegdell et al., 2019). We conducted an acoustic analysis of the calls given in response to the drone as well as to the leopard and snake models recorded in the previous study . For female subjects, 80.0% of the calls were correctly assigned to the three contexts in which they were given. Calls given in response to the drone alarms were clearly distinct from calls given in the other two contexts, with 95.2% correct classification. For male subjects, the overall correct classification was slightly lower (71.2% correct classification), but similar to the findings for females, male drone alarms could be well distinguished from the other two alarm call categories (92.6% correct classification). In conclusion, the calls given in response to a novel flying object by green monkeys differed from those given to the snake and leopard model.
We then compared the alarm calls of the green monkeys and the vervets (Figure 1). Of particular interest was the question how the calls given in response to the drone compared to those given by the vervets to eagles. There was very little variation between the species, compared to the variation between contexts. Despite the similarity of the overall pattern, female green monkey alarm calls were less distinct from each other (80.0% correct classification) than the alarm calls of female vervet monkeys (93.3% correct classification). Similarly, the alarm calls of male green monkeys were less distinct (71.2% correct classification) than those of male vervet monkeys (81.3% correct classification). Interestingly, in the compound assessment of the similarity across species and contexts, the aerial alarms had the highest similarity values between the two species for both males and females (see  for details), which may point to a high selection pressure on the unambiguous recognition of aerial alarms (Wegdell et al., 2019). The assessment of the aerial alarms in this genus revealed a somewhat paradoxical pattern: on the one hand, the aerial alarms appeared to be the most highly conserved call type, with the least variation between East African vervets and West African green monkeys (Wegdell et al., 2019). On the other hand, the classification matrix within female vervet monkey alarm and aggression calls showed the highest degree of misclassification between aerial alarms and calls given during intra-and inter-group aggression (Price et al. 2015). In male vervets, in contrast, calls given in response to leopards graded into those that were given during inter-group aggression. Thus, before firm conclusions about call specificity can be drawn, more systematic collections of different call types (or variants thereof) across contexts need to be compiled, including information about the rates of occurrence. Given the recipients' ability for pragmatic inference (Seyfarth & Cheney, 2016), additional contextual information may alleviate the recipients' difficulties in disambiguating calls given in either aggressive or alarm contexts (see below).
In summary, the link between the perception of an aerial threat and the utterance of a specific call appears to be largely hard-wired. As such, the results corroborate the findings presented in the famous Science paper and elsewhere (Seyfarth & Cheney, 1980;Seyfarth et al., 1980a), which showed that infants are likely to spontaneously give aerial alarms to all sorts of aerial objects. Notably, they do so although adults do not give alarm calls to falling leaves or doves; thus, young subjects do not copy their behavior from adults. Instead, they learn over time that neither falling leaves nor doves constitute dangerous threats and cease to alarm call. The overall similarity of the various call types furthermore support the view that the structure of these calls is highly conserved between the two species .

Call Comprehension and Functional Reference
The playback experiments in the original Science study, as well as the many, many studies conducted in the ensuing years provided ample evidence that primates are able to "attach meaning" to calls in their environment, in the sense that they use calls to predict upcoming events or infer ongoing ones (Fischer & Price, 2017). A few studies investigated the ontogeny of primate responses to calls, and found that adult-like responses to both species-specific as well as hetero-specific calls gradually developed within the first six months of an infant's life (Fischer et al., 2000;Hauser, 1989;Seyfarth & Cheney, 1986). Yet, this slow development may rather reflect the difficulties in learning what prompts alarm calling of other animals than their general ability to attach meaning to a novel sound. Indeed, Barbary macaque, Macaca sylvanus, infants begin to respond more strongly to maternal calls than other familiar females' calls at the age of 10 weeks (Fischer, 2004).
To probe how rapidly adult subjects would be able to associate between a novel sound and the object that caused it, we took advantage of the drone experiments described above. After exposure to the flying drone, we played back the sound of the drone in the next days (Wegdell et al., 2019). If the monkeys rapidly made the association between the sound and the sight of the drone, they should respond with increased vigilance and scan the sky after the presentation of the sound of the drone compared to control sounds. Our experiments showed that this was indeed the case: The green monkeys were able to instantly attach meaning to a novel sound and retain this knowledge (Wegdell et al., 2019). A separate study showed that the monkeys factor in contextual information when choosing their responses , similar to the vervet monkeys that consider their own location before they decide whether they run out of the tree or further up (Seyfarth et al., 1980a). In conjunction, these findings support the view of a fundamental dichotomy in the control over the acoustic characteristics of calls vs. the comprehension of calls and suggest that the emergence of auditory learning abilities preceded the evolution of flexible vocal production (Fischer & Hage, 2019).
The potential for learning and the resulting ability to attach meaning to all sorts of sounds in the environment is the crucial precondition for what has become known as 'functionally referential communication'. To briefly recapitulate, once it became clear that monkey calls were not truly referential in the sense that they designated objects, events, or ideas, Marler et al. (1992) coined the term 'functional reference', stressing that the alarm call system functioned similarly to referential communication, while the underlying mechanisms were at least in terms of signal production very different. The key criteria for functional reference are (i) that signal production is context-specific, that is, the calls are typically given in response to a specific stimulus, the 'referent', and (ii) that these calls elicit specific responses in listeners even in the absence of the supposed referent. In other words, the responses are stimulusindependent (Macedonia & Evans, 1993;Wheeler & Fischer, 2012). This brings us back to the question of context-specificity, and particularly the question of how we sensibly distinguish one context from another. If categories are too broad, then there will be little context-specificity; if contexts are split into finer and finer sub-units, then calls will be highly specific, but the communication inefficient (see debate in Scarantino & Clay, 2015;Wheeler & Fischer, 2015).
Although no study to date has observed the symbolic (arbitrary, conventionalized) use of calls by nonhuman primates, the idea that functionally referential signals are in some way more similar to human language than other types of calls remains pervasive in the animal communication literature. Yet, there is no fundamental difference in responses to drone sounds compared to, say, mating calls. For instance, female Barbary macaques produce conspicuous mating calls during copulations, which are individually distinct. Males pick up on variation in calling in relation to mating success and respond more strongly to calls produced when the male ejaculated compared to when he did not (Pfefferle et al., 2008). Although the context specificity is not very high, listeners pick up on systematic variation and produce adaptive responses.
Following the linguist Paul Grice, primate vocal signals have 'natural meaning' (Grice, 1957), in the sense that they are reliably associated with specific events or behaviors, similar to the observation that smoke indicates a fire, and dark clouds predict rain (see Arnold & Bar-On, this issue). Such natural meaning is explicitly contrasted to the symbolic meaning of human words. Signals with natural meaning mean 'x' only in that they indicate the likelihood of the occurrence of 'x' because of a natural spatial or temporal association with 'x' (Grice, 1957). Correspondingly, Terrence Deacon has argued that functionally referential signals are best seen as indexical signals (Deacon, 1997). In light of these analyses, Brandon Wheeler and I therefore argued that "the concept of functional reference, while historically important for the field, has outlived its usefulness and become a red herring in the pursuit of the links between primate communication and human language" (Wheeler & Fischer, 2012, p. 195). Importantly, functional reference does not appear to have greater explanatory power than classic accounts of animal signaling (Maynard Smith & Harper, 2003).

Insights into the Evolution of Speech and Language?
Shedding light on language origins is an endeavor with enormous seductive allure, and it is thus no wonder that so many people are motivated to identify precursors to language in the animal kingdom. Once it became clear that the 'semantic' communication of primates had little in common with that of human speech and that the similarities between human and primate vocalizations are rather found in the realm of emotion expression, scholars turned to the question of the production and comprehension of syntax (Arnold & Zuberbühler, 2006;Fitch & Hauser, 2004), vocal learning (Crockford et al., 2004), turn-taking (Chow et al., 2015), functional flexibility , play (Langley et al., 2019), to name a few examples. Following Imre Lakatos, one may thus conclude that the study of language evolution is characterized by a series of progressive problem shifts, rather than a linear series of conjectures and refutations (Lakatos & Feyerabend, 1999).
I believe that progress in this field crucially depends on two issues: firstly, it is essential to distinguish phenomenological similarities from true homologues. For this, we must consider the neural and cognitive mechanisms that support a certain phenomenon. For instance, in the field of vocal learning, different routes may lead to greater similarity in calling within a group compared to between groups . Subjects may learn from success that group typical call variants are more effective in eliciting the desired response and therefore use these calls more frequently, or they may store the group typical variant in auditory memory and shape their sound production accordingly. Importantly, both of these adjustments can, in principle, be explained by changes at the level of call usage, instead of modifications of call structure . It has long been known that nonhuman primates have a certain degree of control over whether they call or not, and within certain limitations, which call they use (Seyfarth & Cheney, 1997). Over the last years, we have also achieved a better understanding of the neural mechanisms underpinning flexibility in call usage (for a review, see Fischer & Hage, 2019).
As we have seen for the vervet monkey alarm calls, the link between a certain signal (aerial alarm) and the broader context in which it is given (aerial objects and threats) may be a result of natural selection, while experience narrows down the usage of calls to those stimuli that indeed constitute threats. Thus, careful analyses of the neural circuits or the structural identity of the cognitive building blocks that support the trait in question are necessary before conclusions about the evolution of language can be drawn (Ackermann et al., 2014;Fischer & Hage, 2019;Hage & Nieder, 2016;Jürgens, 1992). Secondly, it is essential to adhere to stringent comparative analyses. For instance, if we find that a distantly related primate species exhibits some trait x, it is normally premature to conclude that this trait x is a precursor to speech or constitutes the substrate from which speech evolved, unless we have solid comparative data on when this trait evolved and that it is indeed shared in the relevant taxonomic branches.
Despite my somewhat critical stance on the similarities of nonhuman primate communication and human speech, I am still awed by the original paper. We would not know half of what we know now about primate vocal communication, were it not for this outstanding contribution. And I may have become a marine biologist, if I had not gotten my hands on "How Monkeys See the World" (Cheney & Seyfarth, 1990) and decided to study monkey communication and cognition instead. The hopes of finding greater similarities in primate communication and human speech were squashed, however, and I believe that the vocal communication of nonhuman primates is more similar to that of dogs than to human speech (see also Cheney & Seyfarth, 1998). Nevertheless, revisiting the original paper is highly recommended, particularly for younger scholars. Reading this paper also tells us so much how the scientific communication style has changed over the decades. All methods details were presented in three terse sentences in the "References and Notes" section, and references and statistical analyses were kept to a bare minimum. What would today be presented in an extended "Electronic Supplementary Information" section, was instead published in a 'companion paper' in the journal Animal Behaviour (Seyfarth et al., 1980b). This paper also expanded the discussion of the findings. In summary, the study of the vervet monkey alarm call system remains one of the most important contributions in our field, and it is incredibly sad that Dorothy Cheney did not live to see the publication of this volume, in honor of this outstanding work.