Finding the signal in the noise: How machine learning can help us perceive, understand, and protect other species

5.31.2023

Sara Keen, Senior Research Scientist, Behavioral Ecology and AI

As we say on the Earth Species Project (ESP) homepage: “More than 8 million species share our planet. We only understand the language of one.” 

As an acoustic engineer and behavioral ecologist, I’m fascinated by this problem. What makes animal communication so much harder to interpret than human languages? Even if we could understand other species, why might that be important?

I recently joined ESP following a career as a field and lab biologist because I think that the latest developments in machine learning will help us to address these questions. The answers have the potential to transform our understanding of the world and may just bring about a paradigm shift in how we view our relationship with nature.

A world of invisible information

When we walk outside we are virtually swimming in a sea of information and signals from other species - many of them imperceptible to us. Sitting on a park bench, you’re likely to be surrounded by songs from birds, insect pheromones, volatile compounds from plants and insects, and UV light patterns reflecting from flowers. This meshwork of information is both incredibly dense and nearly undetectable to humans.

From an evolutionary standpoint, this is very elegant. By having other species “on mute” we can coexist in the same physical space and barely disrupt the conversations of others. Biologists call this resource partitioning; it’s as though each species has their own “radio station” where they can broadcast with little interference from other channels. 

This is largely because we are limited by our own perception systems. As humans, we simply can’t hear a bat's echolocation clicks that are above the 20 kHz upper limit of our ears, and our noses struggle to recognize most of the chemical compounds produced by insects. Removing that background noise allows us to focus on things that are crucial to survival, like easily hearing other people. 

Fortunately, there are specialized tools that help us overcome our perceptual barriers. While doing field work in Kenya, I spent a lot of time watching groups of starlings gather on the savanna. These birds were intriguing because they were cooperative – they lived in commune-like social groups where they shared food and raised each other’s offspring – and were also talkative, calling loudly to greet each other. Given the context clues, it seemed like these birds used the calls to recognize each other, but to me every noise they made sounded almost the same. Using a high quality recording rig and acoustic analysis techniques, it was possible to precisely measure parts of their calls that my ears could never discern. This allowed my collaborators and I to realize that the starlings used certain calls to indicate that they were part of the same cooperative group, almost like a secret handshake or password.

Superb starlings, Image courtesy of Mpala Research Center

That leads us to the harder problem: Even when humans can perceive animal signals, we usually can’t discern the meaning embedded within them. 

Deciphering meaning

Most of our current understanding of animal communication comes from decades of careful measurements and observations by biologists. These studies have been crucial to opening our eyes to the wildly different ways that other species communicate.

Honeybees are a fun example. When a bee finds a particularly juicy flower patch –one with lots of nectar – they head home and tell their hive exactly where it is. But bees can’t explain this with words; instead they use a special dance to give instructions. As others gather around to watch, they use specific movements to explain how far away the flowers are and the precise angle that bees should fly in as they leave the hive. Compared to our wordy human explanations of directions, the amount of information conveyed in a simple dance is  remarkable.

However, the process of interpreting other animals’ signals can be slow. Aristotle famously described honeybee dances in his writings, but we didn't understand their purpose until Karl von Frisch decoded their meaning in the 1900s. That means the time from first recorded observation to complete understanding was more than 2500 years!

Image courtesy of Kai Wenzel

Ultimately, deciphering other species’ signals is a pattern recognition problem. While humans are experts in finding patterns in some contexts, most of us are not great at finding patterns in  whale song, for example. The odds are much lower when trying to understand signals that fall outside of our perceptual space. And it’s just not practical to build specialized tools for every study like we did for the Kenyan starlings.

Fortunately, computers are very, very good at pattern recognition. Many machine learning (ML) algorithms do precisely this: find hard-to-discern patterns and signals in a massive amount of noise. 

Machine learning: Extending human’s ability to perceive and decode 

At ESP, we’ve gathered an interdisciplinary team of researchers that are creating a suite of ML algorithms for animal communication, frequently incorporating cutting edge techniques that emerged in the current AI boom. One key tool that we are using are foundation models. A foundation model is a large, generalized algorithm that is trained with huge amounts of data – in our case, audio recordings of animals. AVES, a foundation model based on autoencoder architecture created by Masato Hagiwara, is helping us gain insight into the underlying signal structure in several species being studied by our collaborators.

Jaclyn Aubin, a beluga whale researcher based at University of Windsor, is interested in the specialized contact calls that belugas use to identify one another. Through her collaboration with ESP, we’ve applied AVES to measure acoustic differences between calls. This could help Aubin to quantify the contact call repertoire of belugas in the St. Lawrence River and define the social structure of this endangered population.

Point cloud illustrating the distribution of different beluga whale contact calls. Each call is represented by a single point, with different colors indicating unique call types. Distance between points corresponds to pairwise similarity between calls.


ESP is also collaborating with crow researchers at the University of Leon to investigate whether certain calls correspond to particular behaviors. Their research team fits crows with specialized microphones that record all sounds created by a particular individual. We’ve applied the AVES model to automatically detect vocalization in long audio streams, which has allowed for an exponential increase in the pace of data analysis. 

Critically, ESP still needs the expertise of scientists who work closely with animals to inform our research approach. Biologists, ecologists, and other domain experts will be the guideposts that point us in the right direction. This ethos has informed ESP’s technical roadmap, and having a tight feedback loop between ML developers and animal behavior experts is essential to accomplishing our goals. 

Looking forward

Machine learning has already transformed our understanding of human language. We expect a similar outcome for animal signals. A deeper understanding of animal communication will guarantee a stronger connection with the world around us. Moreover, it will give us new insights into how human activity affects other species, which will be hugely informative in devising more more effective conservation strategies. 

For now, despite decades of coexistence with other species, we still catch only a fraction of what other animals are saying. Imagine how our world will change when we begin to understand more.




Redirecting you to…