“Bear-ly” Learning: Limits of Abstraction in Black Bear Cognition

We presented two American black bears (Ursus americanus) with a serial list learning memory task, and one of the bears with a matching-to-sample task. After extended training, both bears demonstrated some success with the memory task but failed to generalize the overarching rule of the task to novel stimuli. Matching to sample proved even more difficult for our bear to learn. We conclude that, despite previous success in training bears to respond to natural categories, quantity discriminations, and other related tasks, that bears may possess a cognitive limitation with regards to learning abstract rules. Future tests using different procedures are necessary to determine whether this is a limit of bears’ cognitive capacities, or a limitation of the current tasks as presented. Future tests should present a larger number of varying stimuli. Ideally, bears of various species should be tested on these tasks to demonstrate species as well as individual differences.

discrimination tasks using a touch-screen computer interface.We have shown that black bears are able to discriminate quantities, even among moving sets of dots (Vonk & Beran, 2012), to learn to select the ambiguous stimulus when paired with a non-reinforced stimulus in the ambiguous cue paradigm (McGuire, Vonk, & Johnson-Ulrich, 2017), to perform natural category discriminations even at the abstract level of selecting animal versus non-animal photographs (Vonk et al., 2012), and to demonstrate correspondence between real world objects and their images (Johnson-Ulrich et al., 2016).Results from tasks assessing a more abstract construct of a social relationship (i.e., mother-offspring images versus images of other types of relationships) were more equivocal with the bears eventually learning the discrimination across four sets of photographs depicting various animal species, but failing to show consistent transfer between sets.In addition, the bears showed some degree of transfer to the control images depicting large and small foods (Vonk & Johnson-Ulrich, 2014).Thus, we have demonstrated rapid learning equivalent to that of great apes also tested in similar tasks (e.g., quantity discrimination, Vonk et al., 2014;ambiguous cue paradigm, McGuire et al., 2017; natural category discrimination, Vonk, Jett, Mosteller, & Galvan, 2013;Vonk & Johnson-Ulrich, 2014;Vonk & MacDonald, 2002;2004), but we have not shown that bears can learn truly abstract constructs that are not tied to observable perceptual features, such as demonstrating relational learning or causal reasoning.Here, we present data from attempts to train bears to perform a matching-to-sample (MTS) task, and a serial list memory task.The results indicated that bears could learn which stimuli predicted reward and could generalize such learning to similar images, but that they struggled with abstracting a general rule that is independent of specific stimuli.
MTS tasks represent the classic method for assessing relational learning in nonhumans.MTS tasks can be adapted to assess learning of identity (choose the comparison stimulus that exactly matches the sample), oddity (choose the comparison stimulus that does not match the sample), or conceptual similarity (choose the comparison stimulus that belongs to the same category as the sample, or that involves a functional relationship with the sample), or second order relations (choose the pair of images that demonstrates the same relationship that is demonstrated by the pair of images in the sample).Bears have not yet demonstrated the ability to match objects based on identity, oddity, or second-order relations.It is unclear whether this lack of data represents an absence of opportunity to demonstrate such an ability (i.e., no experimental tests exist) or whether researchers have not published such data because of null findings.Thus, we present here results of attempts to train one black bear to demonstrate identity matching.In our task, the sample image appeared alone until it was selected, and then two comparison images appeared; one of which perfectly matched the sample.Notably, on our MTS tasks, the bears could not simply attend to which stimuli were associated with reinforcement, as all stimuli were presented with equal opportunity for reinforcement (i.e., served as the correct option on the same number of trials within a session).Reinforcement on a given trial could be predicted only based on which sample was presented.
Previous research has indicated that animals will demonstrate better transfer on MTS tasks if they are presented with a large number of stimuli as both sample and comparison images during training.However, training may take longer with a larger set of stimuli (Wright, Cook, Rivera, Sands, & Delius, 1988).We used a set of only four images and never obtained above chance performance from our bears.However, only one bear was tested sufficiently often in the task (approximately 80 24-trial sessions) in order to qualify as a "failure to learn."Notably, we have also experienced difficulty in obtaining above chance performance on the same MTS task working with adult chimpanzees and gorillas (unpublished data), although we had previous success training a young gorilla and several orangutans using categorical, rather than identity, MTS tasks (Vonk, 2002(Vonk, , 2003(Vonk, , 2013(Vonk, , 2014)).
At the same time that we were training our bears to perform MTS tasks, we were also testing them in a serial list memory task.The task was initially developed to encourage both the bears and the chimpanzees that we were testing concurrently (Vonk & Mosteller, 2013) to deviate from side biases when interacting with the touch-screen.Thus, three test stimuli were presented in nine different locations on the touch-screen during the training phase of the task.During testing, the images were presented along with six distractor images and the bears were rewarded for selecting the previously presented stimuli and not for selecting the distractors.Ultimately, our goal was to assess the pattern of errors to determine whether bears were more likely to select non-studied stimuli belonging to the same category as the studied items, or whether they might instead select images that were categorically distinct but shared a perceptual feature, such as color, with the studied items.However, the bears never reached a level of performance that would have allowed us to manipulate the test stimuli in this manner.One bear never reached criterion on this task.Another bear met criterion on one training set of stimuli but failed to demonstrate transfer to novel stimuli.
Similar to the MTS task, we presented the bears with the task using a single set of stimuli at a time rather than training on the task with a wide set of stimuli.As with the MTS task, this decision may have inhibited the bears from extracting a more general rule about the nature of the task.We conclude that one bear was able to learn which stimuli were associated with reward, but was not able to extract the general rule of the task, which was to always select whichever stimuli had been presented on the preceding study trials.We present the data from these training tasks in the hopes that it will encourage other researchers to further probe the limits of conceptual understanding in this, and in other, bear species.
Experiment 1: Matching Method Subject.One adult male American black bear housed at a small privately owned zoo in the southern U.S. participated in this experiment.The bear, Brutus, was housed with his mother and two siblings and had free access to food and water during training.
Materials.A 19" capacitive touch-screen mounted to a rolling steel cart was presented to the chain link divider between the bear enclosure and an area where humans could present the tasks from a Panasonic Toughbook cf18 laptop.Brutus was fed sugar-free wafer cookies and dried fruit and nuts as rewards for correct responses.The experimenter monitored his responses from the laptop and could not see Brutus' face or direct her gaze to the images on the touch-screen during the trials.This prevented inadvertent cues to the bear.
Four images (500 X 300 MP) were arbitrarily selected for use as stimuli; lily pads, a dome house, a young panda bear, and an image of three man-made painted objects.
Procedure.Brutus was presented with 86 sessions consisting of 24 trials in which four sample stimuli were paired six times within a session in random order with each of the other stimuli as comparison images.Thus, each image served as a sample and as the correct comparison for that sample on six trials, and as the incorrect comparison on six trials (twice each for each of the other sample stimuli once on the left and once on the right side of the screen).The correct stimulus was presented equally often on the left and right side of the screen across trials, but no more than three times consecutively on the same side of the screen.
Brutus had previously been trained to use his nose to touch an image on the touch screen (Vonk et al., 2012).When he touched the sample, the two comparison images appeared on the screen directly below the sample in a simultaneous MTS procedure).The sample could no longer activate the screen when touched once the comparison images appeared.After a delay of 750 MS, one of the sample images could be selected.If Brutus chose the stimulus that matched the sample, the computer beeped, the screen turned white and he was handed a small food reward.If he selected the stimulus that did not match the sample, the computer made an unpleasant buzzing sound, the screen turned black and the next trial was immediately presented.
Brutus participated in approximately four test sessions a day three mornings a week over a period of six months.Sessions were administered back to back with a short ITI for set-up (approximately 1 min).He participated in this test concurrently along with training on a two-choice alternative forced choice task (Vonk et al., 2012).This testing strategy was similar to that implemented with chimpanzees (Vonk et al., 2013;Vonk & Mosteller, 2013), orangutans (Vonk & MacDonald, 2004) and a gorilla (Vonk & MacDonald, 2002) tested in similar tasks, allowing for comparisons between species.

Results
The data were examined via histograms and determined to be normally distributed.Brutus never reached above chance levels of performance consistently.Performance was quite variable as can be seen in Figure 1.When we compared his performance across sessions to chance with a one-sample t-test, the results were not significant, t(70) = -0.485,p = .63.However, Brutus did exhibit a significant side bias, t(70) = 2.36, p = .02,touching the right side more than the left side of the screen across trials.Although this bias was statistically significant, it was not a large preference (M = 13.13,SD = 4.03).We also examined Brutus' choices of particular stimuli.Each of the four images was presented equally often across trials as correct and incorrect comparison images.Thus, of the 24 trials, each image would be chosen six times within a session by chance.We compared Brutus' choices of each stimulus to this level of chance with one-sample t-tests.He chose the water lilies (t(70) = 4.02, p < .001)and the dome house (t(70) = 2.28, p = .03)at levels above chance.He chose the manmade objects (t(70) = -3.60,p = .001)and the panda bear (t70 = -2.79,p = .007)at levels below chance.None of the images were selected or not selected at particularly high rates, however, (M's from 5.14 to 6.86), so it appeared that Brutus' choices were fairly random.

Discussion
Previous studies in our lab used a delayed MTS procedure in which the sample disappeared once selected, and was not present when the subject chose between the comparison images (Vonk, 2002(Vonk, , 2003(Vonk, , 2013(Vonk, , 2014)).In the current study, the sample appeared on the screen along with the comparison images at the time of choice.This procedure, although seemingly easier because the subject need not retain a representation of the sample in memory, may confuse animals that view the display of three stimuli as a configuration rather than three unique stimuli; two of which match each other (Katz, Bodily, & Wright, 2008;Wright, 1993).Using only four stimuli within repeated sessions also created the situation in which each stimulus could be rewarded 25% of the time.Thus, responding with side biases, or randomly, might guarantee a subject to receive rewards about 50% of the time without learning the general rule.Indeed, Brutus' choices appeared to be fairly random.Given success by various animals in matching experiments using a larger range of stimuli (e.g., Wright et al., 1998), we suspect that Brutus would have been successful using alternative approaches as well had we more time to continue his testing.However, it is notable that this task required many more sessions for him to learn compared to the many two alternative forced choice discriminations we presented to him (Vonk & Beran, 2012;Vonk et al., 2012;Vonk & Johnson-Ulrich, 2014).In those tasks, individual stimuli maintained consistent relationships with reward, which is not the case in MTS tasks, in which the relationship between stimulus and reward is contingent upon which other stimulus was presented as a sample.

Experiment 2: List Memory
This task was designed to assess the bears' ability to select previously presented stimuli belonging to a single category from distractor images belonging to two different categories.Similar to Experiment 1, the images that were correct depended upon their presentation in the previous trials, rather than on any features specific to the stimuli themselves.Two bears participated in this training.

Method
Subjects.Brutus participated in this task as well along with his brother, Dusty.The bears were housed together along with a female sibling and their mother, but were separated for testing.Testing occurred concurrently with the MTS training and the two-alternative-forced-choice task (Vonk et al., 2012).The bears typically participated in two types of test on a test day to maintain their interest and motivation.
Materials.The test apparatus and environment were the same as in Experiment 1. Four sets of three images (200 X 300 MP) from a single category were used in this experiment; three characters from Shrek, three characters from SpongeBob, three images of sports cars, three images from Spiderman, and three images of computer motherboards.
Procedure.The bears participated in one to three consecutive sessions a day up to three mornings a week over a period of approximately 21 months.Testing was somewhat sporadic given that the bears were participating simultaneously in other tasks (Vonk & Beran, 2012;Vonk et al., 2012) such that they did not receive the memory test on every test session or even every week of testing.
The bears were initially trained on the task of touching a single image on the screen.If they touched anywhere else on the screen other than on the image itself within a set period of time (which was gradually reduced to two seconds), a buzzer sounded and the trial was considered incorrect.If they selected the image on the screen, a pleasant tone sounded and their touches were reinforced with a small food reward, and the next image was presented at another random location on the screen.Initially the bears were presented with only the study trials, with each session consisting of 27 trials.Three images appeared in each of the nine locations on the screen once each across all study trials in random order.Only a single item appeared on a screen during each study trial.Once the bears consistently selected the image within 2s without touching any other part of the screen for all 27 trials, the test phase was introduced.
On the test screen, the three studied target images appeared in random locations on the screen.Six distractor images from two other themed sets of images filled the other locations on the screen.Thus, all nine items appeared simultaneously on the screen during the test.Three test trials were presented within each session in which a bear was rewarded if he touched one of the studied items and not rewarded if he touched a non-studied item.Correct responses were paired with the pleasant tone and food reinforcement while incorrect responses were paired with unpleasant buzzer sounds and no reinforcement.Images disappeared when selected.After three selections, the session ended.The length of the target/study list was limited to three items because all target items and their related distractors had to appear simultaneously on the 19" test screen at the end of the study trials.
The bears were presented with homogenous sets of images, with the distractor images selected from two other homogenous sets until they met a criterion of three correct choices on the test screen for three consecutive sessions.That is, sets contained items belonging to a single theme; Spiderman, SpongeBob, Shrek, motherboards, and sports cars.These training items were chosen because they did not include categories that were being used in categorization tests that the bears were working on simultaneously (Vonk et al., 2012).Brutus began training with the Motherboard images, with Shrek and Spiderman images as distractors.Dusty began training with the Shrek images as target images, and sports cars and Spiderman images as distractors.After meeting criterion on this set of images, he was trained with Spiderman as target images and SpongeBob and sports cars as distractor images.One of the previous sets of distractors became the target items in order to tease apart use of the general task rule and previous reinforcement history in motivating his responses.Transfer would be considered achieved if Dusty selected all three Spiderman images on his first test session with the second set of stimuli.
Once the bears were responding consistently, they were given reinforcement for approximately every third response to the study items rather than for every correct response.No strict schedule for reinforcement was used.The experimenter used her judgement to maintain motivation on a given session.The bears were rewarded with small items such as raisins and dried cranberries on study trials and with a larger treat (e.g., sugar-free wafer cookies) for every correct response on the test screen.This was done because responses during study trials were simply orienting responses to ensure that the bears attended to the test items.However, testing trials required the bears to make a discrimination between previously studied and unstudied items.Thus, in order to emphasize the goal or 'rule' of the task, during the test trials, bears were rewarded for each correct choice.

Results
The data were examined via histograms and determined to be normally distributed.Thus, the number of correct responses on a given session were analyzed using one sample t-tests with chance set to .33,given that one third of the available options on a given session were studied/target items.Although Brutus never reached criterion, overall, his performance was above chance, t(97) = 4.81, p < .001(Figure 2).Dusty's performance on the first set of stimuli (Shrek, Dusty1 in Figure 2) was also above chance, t(58) = 10.01,p < .001.However, in addition to failing to show transfer, Dusty was not above chance overall after a similar number of sessions (N = 58) with the Spiderman stimuli (Dusty2 in Figure 2), t(57) = 0.198, p = .843.
When examining the responses made by each of the bears, it is obvious that Brutus selectively touched images in particular spatial locations more frequently than by chance alone.This observation is confirmed with chi square tests indicating a non-random distribution of choices to spatial locations for all three choices during his test sessions, χ 2 s = 189.71,78.65, and 31.57,all p's < .001.His first test response was primarily to the location on the bottom left corner of the screen (58% of the time), and to the middle bottom location on his second response (32% of the time), and on his third response, to the location on the bottom right side of the screen (31% of the time).He also selected the middle left location more often than the other locations across all three responses.This pattern of responding shows that he preferred to select the items along the bottom row of the screen, which probably required the least amount of effort to touch given that the touchscreen angled slightly away from the enclosure at the top of the screen.It should be noted, however, that Brutus was able to touch all locations on the screen easily during the study trials.Latencies to respond did not appear to vary significantly as a function of spatial location during the study trials.This strategy was reasonably beneficial, as he responded above chance overall, as noted above, and, thus, received rewards more often than he would have by responding randomly.
Dusty clearly adopted a different strategy, attending more to the content of the photos than to their locations.With the first choice of Shrek test images, Dusty also appeared to prefer the bottom left, middle left, and middle bottom locations on the screen, χ 2 = 26.01,p = .001.However, his responses were evenly distributed spatially on his subsequent responses.With the Spiderman images, he did show a tendency to prefer certain locations on the second and third responses, χ 2 s = 16.17 and 15.55, p's = .04and .05,but not on his first choice.On the second choice, he was most likely to touch the middle left and middle center locations.For his third choice, he was most likely to choose the middle right and middle bottom locations.

Discussion
Although Dusty's choices showed some similarity to Brutus' preferences for spatial locations, Dusty was clearly more motivated to touch particular images based on content, at least with regard to the Shrek images.It is surprising that he was not able to learn to touch the Spiderman images in the same number of sessions.This difference in responding suggests that responses were partially driven by the intrinsic qualities of the stimuli rather than their reinforcement contingencies, and that Dusty did not extract the general rules of the task.It is important to address individual differences in response tendencies and motivations when working with small sample sizes (see Vonk & Povinelli, 2011).
Dusty was able to learn to select the previously studied images with one set of stimuli (Shrek) but not with the other (Spiderman).With the second set of images, his most frequent responses were to select the SpongeBob images, which were novel in this phase.The other distractor images were those of sports cars, which had also appeared as distractors in the previous set.Thus, he may have learned that choosing sports cars was not rewarded and he was drawn to novelty rather than to choose the Spiderman images that had previously not been reinforced during testing, although they also appeared during study trials.Given that both the target images and half of the distractor images had a history of non-reinforcement, Dusty's strategy may have been to select the novel images.Although avoiding previously non-reinforced stimuli shows some degree of learning, this also means that Dusty did not learn to associate the presentation of images during the study phase, with reinforcement during the test screen.
Ideally, we should have presented all novel images at testing.However, we had wished to disentangle strategies underlying performance and to pit the strategy of attending to prior reinforcement (or lack thereof) against the strategy of attending to what was studied on the given test session, such that we could use the task to go on and test more interesting constructs, such as false memory (e.g., Vonk & Mosteller, 2013).By pitting two potential strategies against each other, we were able to see which mechanism best explained Dusty's pattern of responding.
Brutus also seemed to fail to extract the general rule of the task as he did not demonstrate a consistent pattern of learning over almost 100 sessions, but, rather, elected to use a strategy of selecting the images that were probably the least effortful to reach.In other work, we have shown that bears do have long-term memory for the reinforcement history of particular images.In fact, Brutus was able to remember which images had previously been rewarded after a period of approximately two years in a two alternative forced choice paradigm with natural category stimuli (Vonk et al., 2012).Dungl, Schratter, and Huber (2008) demonstrated the ability of giant pandas to remember the reinforcement history of elliptical shapes after a period of six months to one year.Perdue and colleagues (Perdue, Snyder, Pratte, Marr, & Maple, 2009;Perdue, Synder, Zhihe, Marr, & Maple, 2011) have also demonstrated impressive memory capacity in giant pandas in a series of spatial memory tasks.Thus, it is clear that the bears' failure in our memory task is not due to a failure of memory per se but is a failure of our task itself, or a failure of the bears to abstract a general rule for test performance.

General Discussion
In both of the tasks presented here, we used a very limited set of stimuli over repeated trials in an attempt to train bears to learn general task ruleseither a MTS or a serial memory task procedure.Based on previous work (Wright et al., 1998), and our own lack of success, it is clear that it would be preferable to present animals with a much larger set of stimuli such that the only invariant aspect of the task is the key relationship between task rules (e.g., always choose whatever image matches the sample, always choose at test what you have seen at study).It is possible that, given the optimal experimental design, bears would demonstrate the ability to abstract these general rules.In fact, we have little doubt that they would, given their success in many other tasks (reviewed in Vonk & Leete, 2017), and the previous success of animals such as pigeons (Berryman, Cumming, & Nevim, 1963) and rats (Leising, Wolf, & Ruprecht, 2013) in MTS tasks.Furthermore, Keen et al. (2014) showed that grizzly bears can learn to make different behavioral responses based on a contingency rule, and we have also demonstrated with a different black bear, that bears can learn a form of conditional discrimination (McGuire, 2018;Vonk et al., unpublished data).Thus, it is not our intent to suggest that bears or apes are incapable of learning matching tasks.Rather, we suspect that over-presenting identity matching trials may hinder animals from extracting a more conceptual understanding of the task (see also Vonk et al., 2012 for a similar argument regarding concrete versus abstract natural category discriminations).Thus, we hope that presenting data reflecting poor performance in this task will encourage other researchers to avoid similar testing procedures and to apply other methods to the same and closely related species.
It is possible that our testing strategy of presenting more than one experimental task to the bears on a given test day may have produced interference between tasks.However, we do not believe this explanation to be likely given that bears were successful on other tasks during the same training period (e.g.Vonk & Beran, 2012;Vonk et al., 2012).We have had similar success with another bear (e.g., Johnson-Ulrich et al., 2016) and with various apes (Vonk, 2002(Vonk, , 2003(Vonk, , 2013(Vonk, , 2014;;Vonk et al., 2002Vonk et al., , 2004Vonk et al., , 2013) ) that also participated in multiple tasks within a session.However, this strategy may have compounded challenges in selecting ideal test stimuli that did not share features with previously presented and/or reinforced stimuli.
Our goal is to inspire other researchers who have the means to continue such work to probe the limits of conceptual understanding in bears, and other less studied species.Doing so will help us understand the selection pressures responsible for advanced cognitive abilities, such as abstraction.If bears are shown to possess such capacities, then it cannot be the case that sociality is the key factor responsible for such traits (Humphrey, 1976;Jolly, 1966).Rather, it is likely that foraging complexity (Byrne, 1997), brain size (Sol, Duncan, Blackburn, Cassey, & Lefebvre, 2005;Sol, Szekely, Liker, & Lefebvre, 2007), and perhaps unstable environments (Sol, 2009) are most critical to the emergence of advanced cognition (see also Benson-Amram, Dantzer, Stricker, Swanson, & Holekamp, 2016;Holekamp, Dantzer, Stricker, Yoshida, & Benson-Amram, 2015).Although we have previously tested orangutans on similar tasks, their success may not speak as strongly to the role of evolving for a social lifestyle as orangutans share an evolutionary history with mostly social primate relatives.
It is also important to recognize that cognitive abilities expressed in wild animals may not always express themselves in captivity for a variety of reasons.For one, animals may be tested in ways that do not make use of their natural responses, or that do not present stimuli in the appropriate sensory modality.Furthermore, animals may not have the same motivations to seek food rewards as they would in natural environments.Our subjects are never food-adjusted and seem to find the tasks enriching regardless of the ratio of reward per response.We have also previously found that spatial memory tasks may not demonstrate bears' true abilities presumably because of a lack of cost associated with foraging in smaller spaces (Vonk et al., 2015;Zamisch & Vonk, 2012).Thus, researchers must be clever to design tasks that motivate animals to learn the intended contingencies.What we have shown is that bears are motivated to engage with computer tasks to receive rewards, and other researchers have shown that other species of bear (sun bears, Helarctos malayanus) may also prefer computer testing to other enrichment activities (Perdue, 2016).Thus, our studies of bears performing computer tasks have paved the way for others to provide similar types of enrichment to other captive bears.
Ideally, analogous cognitive tests will one day be conducted with bears of a variety of species to better determine the importance of factors such as foraging complexity in shaping complex cognition.Benson-Amram and colleagues (2016) conducted a large-scale extractive foraging task with the largest number of carnivores tested in a single study.In their study, bears were among the most successful participants.We expect black bears and grizzly bears to outperform other bear species, such as polar bears and giant pandas, in problem-solving tasks based on their variable diet, need for extractive foraging, and large home ranges.In addition, grizzly bears have the largest relative brain size of all the bears, and indeed of carnivores in general (Gittleman, 1986).A generalist diet may be the most important factor in predicting problem-solving success.In addition, we are currently conducting tests with various bear and feline species to assess their ability to solve a multi-access box problem (Johnson-Ulrich, unpublished data).We are just beginning to probe the limits of bears' conceptual abilities and hope that the current set of preliminary data aids somewhat in that respect.

Figure 1 .
Figure 1.Average number of correct trials across blocks of four sessions with standard error of the mean.The dotted line indicates chance performance.

Figure 2 .
Figure 2. Average number of correct responses across blocks of three sessions with standard error bars representing standard error of the mean.The line at 1 indicates chance performance.