Evolving Views on Cognition in Animal Vocal Communication: Contributions from Scream Research

The last four decades have seen major advances in the study of the cognitive bases of animal vocal communication. The conceptual delineation of senders and receivers has led to a focus on the cognitive processes involved in call production and usage, and those concerning call perception. We review selected relevant literature and discuss how recent thought—specifically, the recognition of a distinction between acoustic variation within versus between call types—expands focus from two to four key questions, specifically, the identification of factors that influence: (1) the usage of different call types, (2) the particular acoustic structure of a vocalization, (3) receivers’ responses to different call types, and (4) receivers’ responses to acoustic variation within a call type. We present the findings of a case study examining the relationship between emotional arousal and the acoustics of scream classes of rhesus macaques (Macaca mulatta) and discuss how the findings relate to broader issues in animal vocal communication and cognition. Screams are phylogenetically widespread, generally highly emotional, and in some taxa, convey complex information about social interactions, making them a promising subject for future research addressing contemporary questions in the field.

Robert Seyfarth and Dorothy Cheney's early postdoctoral collaborations with Peter Marler, beginning with their seminal articles (Seyfarth et al., 1980a, b) on the alarm calls of vervet monkeys (Chlorocebus pygerythrus)-showing that these calls elicit appropriate responses from receivers even in the absence of predators-had an essentially transcendent influence on the study of animal, and particularly nonhuman primate, vocal communication. The period leading up to their work together was striking in that monkey and ape vocal communication was thought to have only a limited, and basically ancillary, role in the complex social systems that were being discovered and documented in the burgeoning field and laboratory research of the time. For example, in an influential review of the topic, Lancaster (1968) wrote that: … vocalizations do not carry the major burden of meaning in most social interactions, but function instead either to call visual attention to the signaler or to emphasize or enhance the effect of visual or tactile signals. In other words, a blind monkey would be greatly handicapped in his social interactions whereas a deaf one would probably be able to function almost normally (p. 442). Views would change considerably after 1980.

Terminology
The history and sometimes contentious debate surrounding cognitive terms in the animal communication literature is extensive and, indeed, daunting to summarize (Cheney & Seyfarth, 1990. For the purposes of this paper-the primary objective of which is to outline an emerging framework for organizing research questions, rather than to provide a taxonomy of cognitive processes, per se-we present a brief overview of relevant terms and concepts. That monkeys respond to conspecific predator alarm calls even in the absence of predators (Seyfarth et al., 1980a, b) suggests that the calls convey information about the type of threat. "Information" in this sense means that a signal disambiguates (or reduces the uncertainty about) a particular event or condition of the world for a receiver, potentially influencing its subsequent behavior . This perspective entails the assumption that receivers possess cognitive capacities that allow for processing this information but makes no specific claim as to the precise nature of these capacities. Some monkey vocalizations, such as anti-predator alarm calls, have been proposed to be

Senders and Receivers
Communication is an intrinsically social event. Minimally, communication has taken place when a sender emits a signal that is detected by a receiver whose behavior might consequently change. In a pair of pivotal papers, Seyfarth and Cheney delineated a critical distinction between senders and receivers (Seyfarth & Cheney, 2003a, b). Responding to what they perceived as the perpetuation of a logically false dichotomy between emotional and externally referential signaling, their central argument was that vocalizations can be emotional on the part of senders and nonetheless convey external information, as the latter depends on receiver cognition-specifically, whether and how receivers conjure a mental representation of the external referent solely from hearing a vocalization. To illustrate this point, Seyfarth and Cheney drew on a point made by Premack (1972): a signal that (hypothetically) occurs entirely as a result of the emotional state of a vocalizer, and does not derive from mental representations of the external event that generated those emotions, can be used by receivers to obtain information about the presence of that external event, as long as the signal is reliably and exclusively correlated with the external stimulus. Thus, in principle, the production of vervet monkey alarms could be motivated by "three flavors of anxiety or arousal: leopard anxiety, eagle anxiety, and snake anxiety" (Dennett, 1983, p. 346) and this would not be incompatible with the view of these vocalizations as externally referential. As Seyfarth and Cheney note: The affective and referential properties of signals are …logically distinct, at least in animal communication, because the former depends on mechanisms of call production in the signaler, whereas the latter depends on the listener's ability to extract information from events in its environment (Seyfarth & Cheney, 2003b, p. 154). Thus, Seyfarth and Cheney (2003a, b) delineated two theoretically independent research questions that each have received much attention in the ensuing years: (1) what are the roles of emotion and other cognitive processes in signal production by senders, and (2) what is the role of information about, and mental representation of, external objects and events in receiver behavior? Indeed, in the following decades, research on external reference largely focused on receivers rather than senders, while attention to senders has, until recent years, largely focused on the emotional underpinnings of vocalizations. This literature is briefly reviewed below.

Alarm Calls, External Reference, and Affect-Induction
The cognitive bases of the behavioral responses of vervet monkeys to playbacks in the experiments by Seyfarth et al. (1980a, b)-subsequently shown in many other species-have been the focus of much discussion (e.g., Cheney & Seyfarth, 1990Gouzoules, 2005;Gouzoules & Gouzoules, 2006;Gouzoules et al., 1985Gouzoules et al., , 1998Owren & Rendall, 2001;Scarantino & Clay, 2015;Seyfarth & Cheney, 2003a, 2017Seyfarth et al., 2005;Townsend & Manser, 2013;Wheeler & Fischer, 2015;Zuberbühler, 2003;Zuberbühler et al., 1999). Do leopard alarm calls elicit in a vervet a mental representation of a large, spotted cat with sharp claws and fangs? Perhaps it is a vaguer representation of some large quadruped, or simply one of terrestrial danger more generally? Precise empirical discrimination among these possibilities is not possible through the traditional playback experiment paradigm, but some conclusions that apply to all of these possible scenarios appear reasonable. First, the call evokes in the receiver a mental representation of some object, agent, and/or event that is external to the sender. Second, this mental representation is categorical, in the sense that receivers classify the referent of the "leopard" alarm as distinct from the referents of other alarm call types (Zuberbühler et al., 1999). It is plausible that this mental representation involves a visual component (i.e., a mental image of something), since receivers respond to alarm calls as if they have seen a predator (for direct evidence of this in birds, see Suzuki, 2018).
In principle, functionally referential communication need not involve mental representations, and some have argued other cognitive hypotheses are more parsimonious (Owren & Rendall, 2001;Rendall et al., 2009). One such alternative explanation is that receivers respond to calls via stimulus-response reflexes resulting in direct activation of motor neurons (e.g., Lloyd, 1989). Such an account would, however, predict that responses to calls would be consistent and fixed, whereas monkeys respond to conspecific calls in a variable, context-dependent fashion (e.g., Price & Fischer, 2014;Price et al., 2015). Incidentally, this observation suggests that receivers somehow take context into account when responding (or not) to externally referential signals, sparking debate on whether responses to purportedly functionally referential signals differ meaningfully from responses to other stimuli (Scarantino & Clay, 2015;Wheeler & Fischer, 2012. This observation is not incompatible with hypotheses about mental representation, but it is interesting and indicates that external reference is not sufficient to explain at least some cases of receiver behavior, since receivers perceive and process not only referential calls but other aspects of the social and natural environments. Another alternative account posited to explain instances of functional reference is that signals function by manipulating the receiver's emotional or other affective state (Owren & Rendall, 1997, 2001Rendall et al., 2009). Under this "affect-induction model," the acoustic features of calls are hypothesized to play a crucial role in that they directly affect the receiver's internal state, e.g., characteristics of the sounds that irritate or excite receivers (Owren & Rendall, 2001). Affect-induction likely plays a role in some instances of animal communication (discussed further below). In principle, affect-induction could function in tandem with evocation of mental representations of external stimuli, but advocates of this model contend that it can fully account for many instances of vocal communication (Owren & Rendall, 1997, 2001; for review, see Gouzoules, 2005). However, a set of habituation-dishabituation experiments conducted by Zuberbühler et al. (1999) with Diana monkeys (Cercopithecus diana) strongly suggest that mental representation is involved in receivers' responses to alarm calls in this species. Monkeys habituated to leopard alarm calls did not dishabituate to a leopard's growl, even though the alarm call and the growl are acoustically distant. However, they did dishabituate to an eagle alarm call. These results contradict the view that acoustic properties and induced affective responses are primary determinants of receivers' responses to calls, and instead suggest that predator-specific calls evoke cognitive representations of different kinds of threats (Gouzoules, 2005;Zuberbühler, 2003;Zuberbühler et al., 1999). The mainstream view in the field of animal communication is that, in most cases, receiver responses to functionally referential calls probably entail mental representation of external referents (Gouzoules, 2005;Gouzoules et al., 1985;Seyfarth & Cheney, 2003a, b;Wheeler & Fischer, 2012;Zuberbühler, 2003;though see Rendall et al., 2009).

Representation of Individuals, Kin Groups, and Dominance Interactions
Following on the groundbreaking research on vervet alarm calls, other types of vocalizations were found to convey externally referential information concerning social dimensions of monkey and ape behavior. These findings, reviewed briefly below, suggest that animal vocalizations have the capacity to evoke mental representations far more complex than simple one-to-one, call-to-object relationships. First, receivers derive information from vocalizations that permits recognition, and mental representation, of individual senders as well as, in some species, the kin groups to which individuals belong Seyfarth et al., 2005), although there has been debate about the degree to which different call types provide sufficient information to evoke these representations Fugate et al., 2008;Rendall et al., 1996Rendall et al., , 1998. Chacma baboon (Papio ursinus) societies, like those of several other catarrhine monkeys, are structured by a matrilineal hierarchy system in which females are ranked within and between matrilines. Rank challenges-aggressive behavior by a lower-ranking individual and/or submission behavior by a higher-ranking individual-can occur between or within matrilines; these sometimes lead to dominance rank reversals. Between-matriline rank challenges should be of greater interest to group members as they represent greater potential upheavals of the social structure. Bergman et al. (2003) played sequences of female dominance grunts and "submission" screams simulating interactions congruent with the hierarchy (dominant grunt of a higher-ranking individual, submissive scream of a lower-ranking individual) and incongruent rank challenges (the reverse sequence) within and between matrilines. Listeners looked longer in response to incongruous, between-matriline sequences than to other combinations, even when the rank distance between the two vocalizers in the within-matriline sequence was great. To show this pattern, baboons must not only recognize groupmates from their vocalizations alone, but also hold some representation of the boundaries between, and hierarchical structure of, the matrilines to which groupmates belong, and form expectations based on these representations.
Like baboons, rhesus macaques live in social groups with matrilineal hierarchies. During agonistic interactions, attacked or threatened victims often scream; rather than merely signaling submission (more formally signaled using bared-teeth facial displays; de Waal & Luttrell, 1985), rhesus screams recruit aid, primarily from matrilineal kin (Bernstein & Ehardt, 1985). Gouzoules et al. (1984) provided evidence that rhesus screams fall into several acoustically distinguishable classes, the usage of which correlates with the intensity of agonism received and the dominance rank of the opponent (see also Gouzoules, 2005;Gouzoules et al., 1985Gouzoules et al., , 1998. Interventions by matrilineal kin may reduce the likelihood of harm to the victim and/or protect the rank of the matriline against challenges from lowerranking individuals (Kaplan et al., 1987). In playback experiments, mothers responded most strongly to the scream class most associated with potential risk of injury, but second-most strongly to the class associated with a threat from a lower-ranking opponent, even though the latter was rarely associated with physical danger (Gouzoules et al., 1984). Gouzoules et al. (1986) also found that the sex of juveniles and the degree of relatedness between the playback target and vocalizer influenced responses to these screams. Playback of screams recorded from immature rhesus monkeys consistently elicited a response, not only from their mothers, but also from other adult female matrilineal relatives. Responses from these more distant relatives occurred in the absence of any cues from the immatures' mothers or any other contextual features that might have been originally available during an agonistic encounter. The pattern of female response to playback of immatures' screams closely resembled that of female aid to juvenile relatives described by Kaplan (1978) and suggested that scream vocalizations alone can function as a principal mechanism in recruitment. Moreover, the patterning of female response to different scream types provides additional support to the contention that screams convey detailed information about external events.
These findings suggest that screams evoke in receivers a mental representation of specific aspects of external events; furthermore, they indicate that receivers respond to instances of agonism associated with a threat to the matriline's place in the hierarchy. The social event that a rhesus macaque or baboon represents when responding to scream classes or call sequences may include the identity of the sender, kin and rank relationships, and different degrees of aggression. Do monkeys hold dichotomous mental categories for "higher-ranking" versus "lower-ranking," or do they mentally represent these as differences of degree? In support of the latter view, scream class usage is more consistent when the rank difference between the sender and the opponent is relatively great (Gouzoules et al., 1998). Specifically, usage of different scream classes might be underlain by a kind of fuzzy logic in which senders represent rank differences (and perhaps agonistic intensity) not as discrete categories separated by hard boundaries, but rather as graded, soft categories or perhaps continua (Gouzoules et al., 1998).
Vocalizations of many species may also convey information to receivers about the motivational state of the sender, i.e., internal reference (e.g., Gouzoules et al., 1985;Morton, 1977). However, most receivers probably do not attribute knowledge or emotional states, per se, to senders-doing so is highly advanced and probably exceeds the cognitive capabilities of most taxa (Cheney & Seyfarth, 2008;Seyfarth & Cheney, 2003a, b). Receivers instead use calls to simulate or anticipate a sender's likely subsequent behavior. Regardless, it is clear that calls that are not externally referential can nonetheless evoke complex mental representations in receivers. This has raised questions as to whether the cognitive bases of external reference are fundamentally different from those involved in receivers' responses to other kinds of calls and even to stimuli unrelated to communication (Wheeler & Fischer, 2012), a matter that remains open to empirical investigation.
In summary, research since 1980 has identified several possible cognitive capacities underlying receiver responses to calls, including categorical and hierarchical mental representation of external phenomena and of individuals and kin groups, induced emotional and other affective states, and stimulusresponse reflexes. These are generally not mutually exclusive, and in many cases, mental representation and affect-induction are likely to play a role simultaneously.

Emotion
Dating back to Darwin (1872), there has been recognition that emotion plays a role in vocal production in animals (e.g., Briefer, 2012;Filippi, 2016;Fischer, 2017;Fischer & Price, 2016;Owren et al., 2011;Schamberg et al., 2018;Wheeler & Fischer, 2012). A view endorsed by some is that of animal vocalizations as largely involuntary outputs of emotional and other affective states (e.g., Owren et al., 2011;Panksepp, 2011). Indeed, evidence from brain stimulation as well as lesion studies suggests that vocalizations in some species are subcortically controlled, suggesting they might be closely tied to affect (Fichtel et al., 2001;Jürgens, 2009;reviewed in Owren et al., 2011). Humans, too, frequently emit involuntary, emotional vocalizations (sometimes referred to as "raw" affect bursts; Hawk et al., 2009;Schröder, 2003). Seyfarth and Cheney (2003a, b) noted that externally referential vocalizations in nonhuman primates could in principle be purely emotionally and involuntarily produced, but argued against this view, observing that human speech is often underlain by an intention to communicate and, simultaneously, an emotional state, with the latter manifesting in predictable acoustic changes to the voice (e.g., Scherer, 2003). They also noted the wide array of factors that influence the usage of calls, including the composition of the audience ("audience effects, " Fichtel & Manser, 2010). For example, chimpanzees appear to modify their recruitment screams based not only on the intensity of aggression, as is the case for rhesus macaques (Gouzoules et al., 1984), but also on the rank composition of the audience: chimps exaggerate screams when there is a listener present who outranks the aggressor (Slocombe & Zuberbühler, 2007). Of course, differences in a sender's emotional state in the presence, versus in the absence, of a particular audience could possibly account for these effects (Owren et al., 2011), but as one considers evidence for more extended context-dependency in call usage, the idea that this is underpinned by subtle differences in sender affect becomes less parsimonious, and thus invoking the involvement of other cognitive processes has been deemed more reasonable Seyfarth & Cheney, 2017. Such efforts are discussed further below.
The predominant approach to emotion in the field of animal vocal production, adapted from the dimensional model (Mendl et al., 2010;Russell, 2003), recognizes the existence of at least two dimensions that characterize emotional states: arousal (level of activation, ranging from placid to alert) and valence (ranging from highly negative to highly positive) (Briefer, 2012; for reviews of emotion concepts in the field of vocal communication, see: Altenmüller et al., 2013;Juslin, 2013). While affect and emotion appear unable to wholly account for senders' vocal production, research across taxa, especially in the last decade, has revealed consistent relationships between emotional states and acoustic properties of vocalizations (Altenmüller et al., 2013;Briefer, 2012;Zimmerman et al., 2013). Emotional arousal appears to generally result in increases in fundamental frequency (F0; the oscillation rate of the vocal folds, commonly perceived as pitch) (e.g., Briefer, Maigrot et al., 2015;Briefer, Tettamanti et al., 2015;Filippi et al., 2017;Rendall, 2003;Scherer, 2003;Szipl et al., 2017; for a possible exception, see Ordóñez-Gómez et al., 2019), as well as in variability of F0 (e.g., Bayart et al., 1990;Lingle et al., 2012;Rendall, 2003;Sugiura, 2007;Yamaguchi et al., 2010) and in noisiness (e.g., Liao et al., 2018;Puppe et al., 2005;Siebert et al., 2011;Stoeger et al., 2011; though see Blumstein & Chi, 2012;Linhart et al., 2015;Szipl et al., 2017). Emotional valence is associated with vocal duration in several species, with shorter vocalizations associated with more positive emotions and contexts (Briefer, 2012;Briefer, Maigrot et al., 2015;Briefer, Vizier et al., 2019;Fichtel et al., 2001;Friel et al., 2019;Taylor et al., 2009). These changes in the voice likely stem from the physiological effects of emotional states on the vocal apparatus, e.g., arousal tensing the laryngeal muscles thereby increasing the oscillation rate of the vocal folds. These cross-species trends are thought to represent evolutionary homology and are attributed to phylogenetic conservation of the mechanisms of vocal production as well as physiological arousal (Briefer, 2012;Filippi et al., 2017;Zimmermann et al., 2013).

Intentionality
Recent contributions to the area of sender cognition have explored the potential role of intentionality (Berthet et al., 2018;Crockford et al., 2017;Deshpande et al., 2018;Fischer & Price, 2016;Schel et al., 2013;Scott-Phillips, 2015a, b;Sievers & Gruber, 2016;Townsend et al., 2017). Authors have adapted Dennett's (e.g., 1983) taxonomy of orders of intentionality to delineate different levels of social cognition on the part of senders (Fischer & Price, 2016;Schel et al., 2013;Townsend et al., 2017). Within this framework, "zero-order intentionality" entails no mental understanding of the outcome or function of one's behavior; the view of a vocalization as purely emotional expression or otherwise involuntary would fall within this category. In "first-order intentionality," senders use signals with some understanding that doing so brings about a response in receivers. Senders at this level do not necessarily understand that their signals change receivers' mental states (e.g., awareness of a predator); communication exhibiting firstorder intentionality might thus be best described as behavior induction. Mental state attribution is, however, a defining feature of "second-order intentionality," in which a sender understands the function of communication in terms of its effects on a receiver's mental state. Finally, in "third-order intentionality" a sender understands the function of a signal in terms of its effects on the receiver's perception of the sender's mental state; this and higher levels of intentional communication are thought to be unique to humans (Fischer & Price, 2016;Scott-Phillips, 2015a).
The essential point here is that there exists a spectrum of possible cognitive processes underlying signal usage, ranging from involuntary and emotional/affective signals (zero-order intentionality) to fullblown, human-like mental state attribution. The question is where different cases of vocalization by animals lie along this spectrum. Chimpanzees use "alert hoos" to signal potential but not impending danger such as the presence of a venomous snake in their path (Goodall, 1986). Recent research has shown that, upon encountering a researcher-planted artificial but life-like snake, wild chimpanzees not only emit alert hoos but target them selectively toward ignorant receivers, monitor receivers' reactions, and continue calling until the "snake" has been noticed (Crockford et al., 2012;Schel et al., 2013). This finding has been interpreted as evidence of at least first-order intentionality (Crockford et al., 2017;Schel et al., 2013), consistent with earlier evidence that captive chimpanzees used vocalizations flexibly to manipulate a human's behavior (Hopkins et al., 2007;Leavens et al., 2004). Thus, a sender understands that the behavior of the receiver is contingent upon the occurrence of the vocalizations, and attends to the receiver to assess whether the vocalizations are having their expected effect. The observation that senders are sensitive to whether the receiver is already informed or not (i.e., has already seen the snake or heard another individual's alert hoos) is perhaps suggestive of second-order intentionality, that is, an understanding on the part of senders that the alert hoos function not only to adjust receivers' behavior but also to inform them of the present danger (Seyfarth & Cheney, 2012). Definitive evidence for this interpretation is lacking, but given experimental corroboration that chimpanzees understand that seeing leads to knowing (reviewed in Call & Tomasello, 2008), it is not out of the question.
Evidence of second-order intentional signaling is lacking for most other nonhuman species (Cheney & Seyfarth, 2008;Seyfarth & Cheney, 2003b), but first-order intentionality may be a viable working hypothesis for many instances of animal communication (Townsend et al., 2017). For example, rhesus macaques have shown some of the behaviors associated with first-order intentionality in their gestural communication toward human caretakers (Canteloup et al., 2015), and it is plausible that this might extend to call usage: screaming macaques have been reported to scan the area, perhaps monitoring the reactions of potential aiders ("show-looking;" de Waal et al., 1976). In summary, although emotion is clearly involved in many instances of vocal signal production, other cognitive processes appear to be important as well.

Acoustic Variation Within and Between Call Types
Alongside research into the more complex cognitive processes at play in senders, the field has seen recent advances in thinking about the patterning of acoustic variation within and between call types. Researchers have recognized that many species' call types-including vervet monkey alarms-are not totally discrete, stereotyped acoustic categories; they often exhibit within-type variation and grade into one another to some degree, with some vocalizations exhibiting intermediate acoustic structures (Anikin et al., 2018;Price et al., 2015;. Based on this acoustic variation and gradation, contemporary literature around primate vocal communication portrays call types as clusters of vocalizations within a multidimensional acoustic space, separated from one another by fuzzy rather than hard boundaries Price et al., 2015;Wadewitz et al., 2015). Although this might seem like prima facie evidence against categorical representation of call types, the fact that call types are acoustically graded does not suggest that they are not represented as mental categories by receivers and/or senders; the many playback experiments reviewed above suggest categorical representation by receivers, while the question of how senders mentally represent their own call types remains open to investigation.
The recently emerging consensus that most animal call types exhibit acoustic variation has raised the question of, on one hand, the factors involved whether an animal vocalizes and which call type is emitted versus, on the other, the factors determining the precise acoustic structure of that vocalization. This distinction-traditionally posed as "vocal usage" versus "vocal production", respectively-is not new . The novel contribution to this conversation from recent research on the patterning of acoustic variation within and between call types is the recognition that substantial withintype variation and between-type gradations are more ubiquitous across animals than previously thought.
Synthesizing recent advances in sender cognition and in acoustic patterning, some authors have proposed that, in general, the particular acoustic structure of a vocalization (i.e., within-type variation) reflects the sender's emotional arousal whereas call usage might reflect additional cognitive processes (Schamberg et al., 2018;Wadewitz et al., 2015). In support of this, a recent study with common marmosets (Callithrix jacchus) showed that within-type acoustic structure was associated with sender heart rate (a reliable measure of arousal), while call type was associated with the presence/absence and distance of a conspecific but not heart rate (Liao et al., 2018). Other studies have demonstrated a relationship between within-type acoustic structure and emotional valence in addition to or instead of arousal (Briefer, 2012;Briefer, Maigrot et al., 2015;Briefer, Vizier et al., 2019;Friel et al., 2019;Maigrot et al., 2018). Many recent studies on the vocal communication of emotion have at least implicitly, and sometimes explicitly, recognized the necessity of assessing the significance of variation within call types (e.g., Briefer, 2018;Briefer, Maigrot et al., 2015;Briefer, Vizier et al., 2019;Faragó et al., 2017;Friel et al., 2019;Linhart et al., 2015;Maigrot et al., 2018;. In addition to arousal, individual variation serving to convey the identity of the sender probably largely lies within call types, rather than or in addition to patterns of call usage (Rendall, 2003;Rendall et al., 1998).

Acoustic Variation Between and Within Call Types: Semi-Separate Channels?
The distinction between the factors determining call usage versus those shaping within-type acoustic structure represents an important step forward in the field of animal vocal communication and warrants revisiting an old question: the extent to which different kinds of information might be simultaneously available from animal vocalizations. In their 1980 paper on vervet alarms, Seyfarth et al. (1980b) suggested that emotional and external information might be simultaneously accessible from vocalizations, writing, "It … seems appropriate to interpret arousal-related properties of alarm calls as ancillary to more specific call features, supplementing and enriching the meaning of calls rather than serving as a primary basis for meaning" (p. 1091). Twenty years later, research on meerkats (Suricata suricatta) provided direct support for this idea. Like vervet monkeys, meerkats produce distinct alarm calls in response to terrestrial predators, avian predators, and snakes (the snake alarm also being associated with other stimuli), and receivers respond appropriately (Manser, 2001;Manser et al., 2001). In addition to predator type, the alarms vary acoustically depending on the urgency of the threat (e.g., the proximity of the predator), with high-urgency alarms associated with a noisier acoustic structure and lowurgency alarms appearing more tonal (Manser, 2001); receivers respond not only to the predator type indicated by the call but also the urgency, with high-urgency calls eliciting more rapid responses and longer vigilance . The acoustic parameters important for distinguishing the predator types are different from those that correlate with urgency. Following the assumption that high-urgency situations evoke high-arousal emotional states in senders, these findings were interpreted as evidence that receivers could access information relating to the external predator type and the emotional state of the sender simultaneously, through separate acoustic channels (Manser et al., 2002;Seyfarth & Cheney, 2003a)-though the authors were careful to avoid specific claims about the cognitive processes involved. Later research from this group suggested that these two channels develop on different timescales, with the acoustic features associated with urgency ontogenetically preceding those that correlate with predator type (Hollén & Manser, 2007).
Although it was recognized early on that different acoustic parameters might convey emotional and external information, it was long unclear whether there might be a general pattern by which these two sets of acoustic parameters might be segregated. Recent thinking about acoustic variation within and between call types, and the differences in the cognitive mechanisms underlying call usage and vocal production, point to a speculative but intriguing possibility. Perhaps, like vervet monkeys (Price et al., 2015), meerkats possess acoustically variable, graded, fuzzy, but discriminable alarm call types, each conveying information about a different category of external phenomena. The usage of these call types could be underlain by a variety of possible cognitive processes including, but not limited to, emotional state and first-order intentionality. Acoustic variation within each of these three call types, in contrast, probably correlates more directly with the emotional arousal of the sender (and therefore the urgency of the threat). If that is the case, perhaps acoustic variation between versus within call types might comprise semi-separate channels functioning to convey information about the predator type, and urgency, respectively.
This hypothesis could apply not only to meerkats but also to many of the species exhibiting externally referential call types. Even in species that have not shown evidence of external reference, it is plausible that acoustic variation between and within call types might convey functionally distinct significant categories of information to receivers. The general trend seems to be that within-type variation correlates with senders' emotional arousal and valence across species, and thus serves as a consistently available source of information about sender emotion and/or, potentially, any external phenomena that correlate reliably with the sender's emotional state. Within-type variation can, as discussed earlier, also convey the identity of the sender, insofar as the acoustic structure of a call varies by individual. The information available from the usage of different call types, in contrast, varies extensively depending on the species and the contexts in which calls occur, and can relate not only to external phenomena but also the probability of subsequent sender behavior . Thus, the information conveyed by these channels may overlap, hence our designation of them as "semi-separate."

Evolution of Acoustic Variation Between and Within Call Types
The evidence reviewed above warrants a brief discussion of implications for general processes in vocal evolution, although such speculation, of course, requires caution. Specifically, it seems plausible that evolutionary forces might have different roles in shaping acoustic variation within versus between call types. On one hand, acoustic differences between call types seem responsive to selection pressures, or at least capable of rapid evolution, as demonstrated by differences in vocal repertoires across species (Dunn & Smaers, 2018;Gouzoules & Gouzoules, 2000;; though see Hammerschmidt & Fischer, 2019). For example, four closely related macaque species (Macaca spp.) exhibit clear differences in the acoustic qualities of screams used in the same agonistic context (Gouzoules & Gouzoules, 2000). On the other hand, some emotion-related acoustic variation within call types appears to be tied to the physiological and anatomical mechanisms of vocal production; for example, the correlation between emotional arousal and vocal F0 comes about as a natural consequence of the local effects on the laryngeal muscles of the generalized, body-wide arousal response. A hypothetical evolutionary change in the acoustic correlates of arousal within a call type might, in theory, require a change in either the generalized arousal response itself, or basic vocal anatomychanges that could conceivably carry additional, negative fitness consequences. In addition, it is plausible that the relationships between emotion and acoustic variation within call types might be under stabilizing selection, due to benefits for senders associated with the function of conveying emotional information (Altenmüller et al., 2013). These pressures alone or in conjunction might have constrained emotionrelated within-type acoustic variation over evolutionary time, explaining why the relationships between emotional arousal and within-type acoustic structure appear conserved across such a broad range of taxa (Briefer, 2012;Filippi et al., 2017;Zimmermann et al., 2013). Within-type acoustic variation might therefore comprise a potential source of "honest" information about the sender's emotional state (see Fitch & Hauser, 2006). This hypothesis generates testable predictions: if it is the case, then the acoustic correlates of emotions should be generally consistent not only across species, but also across contexts and call types within species.

From Two Questions to Four
To summarize, the field of animal vocal communication appears to be moving toward a view of acoustic variation between versus within call types as differing with respect to ultimate functions as well as cognitive mechanisms in both senders and receivers. We suggest that this distinction is analogous to an extension of Seyfarth's and Cheney's (2003b) efforts to delineate the boundary between senders and receivers as objects of empirical study. Building on the two broad questions laid out by  what are the roles of emotion and other cognitive processes in signal production by senders, and (2) what is the role of information about, and mental representation of, external objects and events in receiver behavior?-we see recent directions in the field as pointing to four questions centering on the factors that contribute to: (1) call usage by senders, (2) the particular within-type acoustic structure of a given call, (3) receivers' responses to different types of calls, and (4) receivers' responses to acoustic variation within call types (Figure 1). All four of these questions can and should be addressed at both the proximate and ultimate levels. The evidence reviewed thus far collectively points to some of the ways that the cognitive processes involved in these four categories of behavior within instances of communication might differ (Figure 1). To summarize briefly, on the sender side, the long-held view of emotion as important for vocal usage and production has been borne out by much research, but it is no longer tenable to view all calling behavior as purely emotional or involuntary; first-order intentionality seems likely to play a role in many taxa as well. Often, emotion probably works in tandem with cognitive processes relating to first-order intentionality to influence calling behavior. On the other hand, within-type variation might generally relate more reliably to senders' emotional states, as well as inter-individual differences in vocal anatomy and production mechanisms. On the receiver side, as reviewed above, responses to different call types can be underlain by mental representation, affect-induction, stimulus-responses reflexes, or some combination of these. Receivers probably also represent the identity of the sender, conveyed through within-type acoustic variation . How receivers might respond to emotional information available from variation within call types is less clear. It is unlikely that most animals understand this information by attributing emotional states to senders (Cheney & Seyfarth, 2008;Seyfarth & Cheney, 2003a, b), but that does not mean that the emotional nature of within-type acoustic variation is irrelevant from the standpoint of receiver cognition. Emotional contagion, the process by which one individual's emotional state spreads to others through automatic perception-action processes (Preston & de Waal, 2002), could allow the emotional arousal of vocalizers to directly affect the emotional arousal of receivers, causing them to respond more urgently (Briefer, 2018). Thus, the affect-induction model (Owren & Rendall, 1997, 2001) might be especially well suited to explain receiver responses to withintype acoustic variation. A non-mutually exclusive possibility is that receivers might use within-type acoustic variation to predict the probable subsequent behavior of the sender, or to mentally represent additional external phenomena of emotional salience to senders, e.g., the proximity of a threat (Manser et al., 2001(Manser et al., , 2002. This recent shift in the framing of animal vocal communication research not only advances thinking about the findings of past research but also holds the potential to generate novel questions. Of the four questions delineated above, receivers' responses to acoustic variation within call types has received the least empirical attention. Few studies have examined whether receivers are even sensitive to emotion-related within-type variation, let alone how, except in humans (e.g., Wood et al., 2017). The hypothesis that emotional contagion is involved in this process seems promising as a jumping-off point for future research (Briefer, 2018). Neuroscience research holds the potential to elucidate the brain networks that process emotional information from calls, and how these might differ from those that process externally referential components (Rauschecker, 2013). The other three questions also contain no shortage of interesting areas for future research. Additional topics worthy of further study include the interactions between cognitive mechanisms of call usage, perception of different calls, within-type acoustic structures, and perception of within-type variation, and the evolution of these different mechanisms. For example, how do the potential differences in cognitive bases of call usage versus within-type acoustic structure make available different kinds of information to receivers; how have these different kinds of information resulted in different selection pressures shaping the cognitive processes receivers use to perceive each kind of acoustic variation; and are these different kinds of acoustic variation under different evolutionary forces? These comprise promising future directions for the field.

Contributions from Scream Research: Rhesus Macaque Screams as a Case Study
Primate screams seem a promising call type for future research in the four areas outlined above, especially regarding the roles of emotion and other cognitive processes in call production and perception. Screams almost invariably occur in high-arousal contexts and can thus be viewed as highly emotional on the part of senders, yet chimpanzee screams show audience effects that are difficult to explain through sender emotion alone (Slocombe & Zuberbühler, 2007). On the receiver side, screams are inherently emotionally evocative, with human listeners showing emotional neural activation in response to screams and other sounds that share acoustic properties with them (Arnal et al., 2015;Belin & Zatorre, 2015). At the same time, the screams of rhesus macaques, chacma baboons, and chimpanzees appear to evoke mental representations of the identity of the vocalizer, the rank of the opponent, and/or the degree of agonistic intensity experienced (Bergman et al., 2003;Gouzoules, 2005;Gouzoules et al., 1984Gouzoules et al., , 1985Slocombe et al., 2010), and human listeners perceive acoustic variation among con-and heterospecific screams as containing cues to identity (Engelberg & Gouzoules, 2018;Fugate et al., 2008) and emotional arousal . Research across taxa addressing how emotion and other cognitive processes map onto the usage and perception of screams, versus acoustic variation among screams, has the potential to contribute directly to the recent directions reviewed in this paper.
The vocal repertoire of rhesus macaques contains multiple scream classes, with acoustic variation both among and within these classes, and therefore seems like an especially promising model in which to explore these issues. Below, we take the opportunity to further flesh out the organizational framework described above by applying it to rhesus macaque screams, sharing the results of a preliminary case study beginning to examine how emotional arousal relates to acoustic variation within versus between scream classes in this species.
As reviewed above, rhesus macaque screams function to recruit aid during agonistic interactions, and fall into acoustically distinct but graded scream classes, the usage of which correlates with the rank of the opponent and the intensity of aggression received; receivers respond differently depending on their relationship to the sender and on the scream classes used, suggesting the screams evoke a mental representation of aspects of the interaction playing out (Gouzoules et al., 1984(Gouzoules et al., , 1985. Most research into rhesus macaque screams has focused on the usage and perception of, and acoustic variation among, different scream classes, whereas the cognitive processes underlying production and perception of acoustic variation within scream classes has received relatively little attention. Although Gouzoules et al. (1984) challenged the idea that arousal could account entirely for differences among scream classes, they noted the existence of acoustic variation within scream classes (see also Le Prell et al., 2002); following the recent directions laid out within the present paper, it is plausible that arousal plays a greater role in that variation than it does in scream class usage. The question of how this variation relates to individual identity has received attention (Fugate et al., 2008;Rendall et al., 1998), but the relationship between emotional states and acoustic variation within scream classes has remained open.
To investigate this, we (Schwartz, J. W., & Gouzoules, H.) recorded and analyzed the acoustic properties of rhesus macaque screams emitted during naturally occurring social interactions. Detailed methods are available online in the Supplementary Materials. We attributed varying levels of emotional arousal to vocalizers depending on the degree of agonistic intensity received, ranging in intensity from no visible threat, to threats with and without physical contact (e.g., a lunge forward with or without a grab, respectively), to high-intensity chase and attack interactions. We recorded multiple screams of a given scream class from bouts associated with each of these different contexts; often multiple scream classes occurred within a single bout, likely relating to dynamic changes in the nature of an interaction as it played out. We then compared the emotional arousal of a sender, operationally defined according to the intensity of the preceding threat, to the acoustic properties of screams, while accounting for scream class. Under the hypothesis that within-class acoustic variation correlates more strongly with emotional arousal than between-class acoustic variation or scream class usage, we predicted that several acoustic correlates of emotional arousal in mammals-including parameters related to F0, energy distribution, tonality, and scream duration (e.g., Briefer, 2012)-would correlate with agonistic intensity, irrespective of variation between scream classes.
Detailed results are available online in the Supplementary Materials. Based on numbers of screams recorded, we limited analysis to two distinct scream classes: tonal screams (N = 178; clearly identifiable F0 contour, and a duration lasting 0.25-1 s) and pulsed screams (N = 184; < 0.25 s in duration, ranging in tonality from noisy and cough-like to tonal and chirp-like). Note that tonal and pulsed screams are just two of the scream classes exhibited by rhesus macaques, and might not be representative of screams as a whole in this species, especially since they elicit weaker responses than noisy and arched screams (corresponding to severe aggression and rank challenge, respectively; Gouzoules et al., 1984Gouzoules et al., , 1986. We found a significant main effect of the intensity of the preceding agonism on mean F0 and start F0 of screams irrespective of scream class (Figure 2). Additionally, minimum F0 and the frequency value of the upper limit of the first quartile of energy (dfa25) both correlated with agonism intensity among pulsed screams, showing higher values in high-arousal attack and chase contexts, though not among tonal screams (Figure 2). Insofar as agonistic intensity correlates with the arousal of the sender, these acoustic changes are probably attributable to increased tension in the vocal folds resulting in greater overall F0, and match broader trends across mammals (Briefer, 2012). Thus, although arousal appears unable to account for the acoustic differences among scream classes (Gouzoules et al., 1984), it does correlate significantly with acoustic variation within tonal and pulsed scream classes, in a fashion consistent with other species and with known mechanisms of vocal emotion expression. Furthermore, counter to the notion that arousal underlies acoustic gradations between scream classes, the acoustic parameters that correlated with arousal in the present study (those related to F0) are not those that were originally described as distinguishing the scream classes from one another (Gouzoules et al., 1984); temporal characteristics and those related to spectral noise would better fit that criterion, yet these did not exhibit significant arousal-dependent variation within our sample. Figure 2. Acoustic variation within scream classes, and relationship to context, with lowest arousal on the left and highest arousal on the right. Horizontal lines denote significant pairwise differences for all screams regardless of class (solid black line) or for pulsed screams only (dotted line).
It is important to reiterate two limitations of this preliminary investigation. First, our conclusions are limited to tonal and pulsed screams, which were originally reported as less significant for receivers (Gouzoules et al., 1984(Gouzoules et al., , 1985. Second, it might be overly simplistic to reduce an entire agonistic interaction to a single threat and a single associated arousal level. Agonistic interactions are dynamic, potentially escalating or de-escalating in perceived intensity (and arousal) over the course of a bout of screaming; we accounted for this when escalations were obvious (when a low-intensity threat was followed by a higher-intensity threat; see online Supplementary Materials), but some changes in senders' arousal states over the course of an interaction probably remain unaccounted for.
Having stated these limitations, the findings of this preliminary case study permit two tentative conclusions with respect to broader issues outlined in this review regarding the cognitive mechanisms of call usage and vocal production, and call function. First, the cognitive processes determining the particular acoustic structure of a tonal or pulsed scream (i.e., within-class variation) appear to differ from those determining scream class usage and between-class acoustic differences, with emotional arousal clearly playing a role in the former but less clearly in the latter. It remains unclear what additional cognitive mechanisms might be involved in the usage of different scream classes. Previous writing has suggested that senders hold mental representations of the rank differences between themselves and opponents (H. Gouzoules et al., 1998). In addition, we have observed a tendency among screaming rhesus macaques to scan the area, perhaps monitoring the reactions of potential aiders (Schwartz, J. W., & Gouzoules, H., personal observation; for a similar observation in crab-eating macaques, Macaca fascicularis, see "show-looking" in de Waal et al., 1976), which if true, would be considered evidence supporting first-order intentionality (though, on its own, insufficient to conclude such, Schel et al., 2013).
Second, the acoustic parameters showing arousal-dependent variation while accounting for scream class (primarily related to F0) seem to be different from those that were originally reported to differentiate scream classes (e.g., duration, tonality, Gouzoules et al., 1984), consistent with the hypothesis that acoustic variation between and within scream classes function as semi-separate channels for different types of information, processed by receivers through different cognitive processes. More direct evidence of this hypothesis is yet lacking, but some tentative speculations are possible. As discussed above, different scream classes convey information about the rank difference between the vocalizer and the opponent and about the intensity of aggression involved; cognitive mechanisms involved in responses to different scream classes probably involve mental representation of kin-and rankrelationships and the details of an incident of aggression. Simultaneously, the present results suggest F0 variation within tonal and pulsed scream classes might convey further information about the intensity of aggression, the internal state of the vocalizer, and/or the sender's likely subsequent behavior. Given the cross-species emotional significance of vocal F0 (Briefer, 2012) and the emotional salience of F0 for human listeners (Filippi et al., 2017;Kelly et al., 2017;Sauter et al., 2010), including in scream perception , it seems plausible that rhesus macaque listeners might respond to F0 variation within these scream classes via emotional contagion (Briefer, 2018), although this has not been tested. Note that F0 is not always perceivable in rhesus macaque screams, especially the class of noisy screams exhibited in the most intense contexts (Gouzoules et al., 1984). Furthermore, the distinction between these two possible categories of information is not strict, since the usage of scream classes depends in part on the same variable we used as a proxy for arousal, (the intensity of agonism received by the vocalizer, Gouzoules et al., 1984), but it might still be significant in terms of biological function and/or cognitive mechanisms. Future research into rhesus macaques' responses to different scream classes versus to within-class variants, and the role of arousal in the acoustic properties of other scream classes (noisy and arched screams), has the potential to shed light on these questions.

Conclusions
The 40 years since the initial vervet playback studies by Seyfarth et al. (1980a, b) have seen major advances in scientific understanding of animal communication. A critical framework for these advances has been the delineation of senders and receivers and the cognitive processes underlying the behavior of each (Seyfarth & Cheney, 2003a, b). Recently, the field has also begun to distinguish between-and within-type acoustic variation, effectively dividing two broad questions-what factors determine (1) call usage and production by senders, and (2) receivers' responses to calls?-into four: what factors determine (1) call usage, (2) within-type acoustic structure, (3) receivers' responses to different call types, and (4) receivers' responses to within-type acoustic variation? Screams, which are phylogenetically widespread, emotional, and in some cases convey complex information, hold particular promise for addressing contemporary questions in the field of animal communication. Ever-present in this field is the view that animal communication may provide a "window on the minds of animals" (Griffin, 1976), which served as a guiding principle for the influential initial work of Seyfarth et al. (1980a, b), and much research since. This principle will doubtless remain a cornerstone as the field continues to delineate and address novel questions.