Listening for Frogs at Scale: How FrogID Evaluated NatureLM-Audio on Real-World Data

8.13.2025

Posted by

Dainty Green Tree Frog. Photo: Gunter Schmida / Courtesy of Australian Museum

Key Takeaways:

FrogID, the largest frog-focused citizen science project in the world, led by the Australian Museum, recently evaluated NatureLM-audio for use in processing and analyzing hundreds of thousands of frog recordings.
FrogID evaluated NatureLM-audio on five core identification tasks and found strong performance even without fine-tuning.
NatureLM-audio achieved near-perfect scores on Frog vs. Not-a-Frog classification, which the FrogID team will now incorporate into their pre-processing pipeline. Around 9% of submissions are non-target taxa, such as insects and birds. Pre-processing with NatureLM-audio could save FrogID 300+ hours of manual validation a year.
Despite not being trained explicitly on Australian frogs, NatureLM-audio had 82% accuracy in identifying species from a sample of the top 5 most commonly recorded species in FrogID when there was only one frog species calling.
NatureLM-audio demonstrated emergent generalization to tasks it was not explicitly trained for – such as detecting human speech or distinguishing frogs from non-frogs. This flexibility points to broader potential across new bioacoustics tasks, species, and conservation efforts.
NatureLM-audio is open-sourced on HuggingFace and will soon have a lightweight, no-code UI to make it even easier to get started.

Of all the world’s major animal groups, amphibians are experiencing some of the most concerning and severe population declines, disappearing around 200 times faster than the natural extinction rate. Living double lives in both water and on land, frogs are especially sensitive to environmental change — they absorb pollutants through their permeable skin and respond to shifts in both habitats. Often among the first to sound the alarm, their rhythmic croaks and calls are vital signals that help researchers monitor the health of an ecosystem.

Sound of the Whirring Tree Frog

Because each frog species has a unique call, audio recordings are often the most accurate and least invasive way to identify them. The best way to keep a pulse on frog ecosystems is by listening carefully, patiently, and often - which is exactly the goal of FrogID, the flagship citizen science project from the Australian Museum.

Using the FrogID app to record frog calls. Photo: Jodi Rowley/Australian Museum

FrogID is a free mobile app by the Australian Museum that lets anyone in Australia record 20-60s audio clips when they think they are hearing frog calls in the wild and submit them to be verified by a team of frog call experts. Since launching in 2017, FrogID has grown into the largest frog-focused citizen science project in the world, with over 1.3 million recordings of frogs. The scale of data has been invaluable for researchers like Dr. Jodi Rowley, Curator, Amphibian and Reptile Conservation Biology at the Australian Museum and UNSW, and lead scientist of FrogID, who is studying Australia's unique frog diversity, distribution, and what frog ecosystems tell us about biodiversity and the health of ecosystems more broadly.

How It Works

FrogID Validator. Photo: Nadiah Roslan/Australian Museum

Dr Jodi Rowley. Photo: Devise/Australian Museum

Every frog call submitted by citizen scientists is manually verified by scientists at FrogID one-by-one, which can be time-consuming and requires deep expertise.

As FrogID scaled up, the team began to explore how machine learning could support their workflow to identify and match frog calls more efficiently. In addition to speeding up the validation process, they also saw potential to provide real-time identification of which frog the user heard in the app (think “ Shazam for frogs”), and a need to analyze large-scale passive acoustic monitoring (PAM) data.

That’s when FrogID came across NatureLM-audio, our audio-language foundation model designed specifically for bioacoustics. Although NatureLM-audio hasn’t been specifically fine-tuned for identifying Australian frogs, the FrogID team was curious about its potential for their use cases. FrogID team member Julia Tan led an exploration of NatureLM-audio’s applications, running an evaluation of five core tasks on their frog dataset.

“The model’s ability to differentiate broadly between frog vs. bird vs. insect was particularly impressive, as well as the human speech detection capabilities (given that the model was not explicitly trained on this task)” - Julia Tan, Scientific Officer, Australian Museum

‍

Evaluation Tasks & Results

Frog vs. Not-a-Frog

Fig. 1: Evaluation dataset: 140 examples - 100 frogs (20 from each species in FrogID Top 5), 20 birds, and 20 insects.

The initial classification task focused on detecting whether an audio clip contains a frog call, as opposed to common misidentifications like birds and insects. The FrogID team applied preprocessing to the raw audio before running it through NatureLM, which achieved near-perfect scores across all metrics. This is especially impactful because, while FrogID has one of the world’s largest collections of frog data, it does not have an equally broad dataset of non-frog sounds, making “not-a-frog” classification challenging until now. Because NatureLM-audio was trained on a diverse range of species and environments, it can perform this task out-of-the-box, enabling FrogID to filter non-frog recordings and focus expert validation where it’s needed most.

‍

Identifying the Focal Species

Fig 2: Evaluation dataset: 100 examples - 20 from each species in the FrogID Top 5

Bird Misclassification Example: Ground Truth Label - Eastern Dwarf Tree Frog, Prediction - Torresian Crow

Identifying which frog was just heard is at the core of FrogID’s work. NatureLM-audio, despite not being trained specifically for frogs, performed fairly well on these predictions. Of the 18 errors, 7 were predicted as birds. Ten were classified as other non-Australian frogs — reasonable misclassifications given the close acoustic similarities of the calls. This suggests that adding specific constraints, like limiting predictions to Australian frogs, could significantly improve accuracy for FrogID’s use case. If fine-tuned enough to be reliable, the model could streamline manual validation by allowing the expert validators to focus on confirming the model’s predictions, rather than identifying each call from scratch.

‍

Identifying Multiple Species

Fig. 3: Evaluation dataset: 80 examples with the following breakdown (20 examples each from chorus calls with: single-species, 2-species, 3-species, 4-species)

Audio submissions that contain multiple, overlapping speakers and species makes validation much more challenging, as these initial results show. Although NatureLM-audio struggled to accurately identify each species present, one way it could still be useful is by estimating how many different species were present. The FrogID team could then use this count as a proxy for complexity to help prioritize which recordings the expert validators should review first.

‍

Detecting a Target Species

Fig. 4: Evaluation dataset: 80 cane toad examples (20 each for single-species, 2-species, 3-species, 4-species), 100 not-a-cane-toad examples (20 each from the FrogID Top 5)

This task represents an important conservation use case: tracking a specific target species, such as the Cane Toad (Rhinella marina), an invasive species rapidly spreading across Australia. Closely monitoring its presence is key for effective management and control. In this initial run, NatureLM-audio showed moderate success in identifying cane toad calls, even when other species were present in the same recording. Further fine tuning could improve performance on identifying specific species.

‍

Identifying Human Speech

Fig. 5: Evaluation dataset: 20 positive human speech examples, 100 negative human speech examples

After applying source separation, NatureLM-audio performed well in detecting human speech within audio submissions – a task it was never explicitly trained for, yet generalized to quite well. While initial performance missed 14 out of 20 human speech samples, this dropped to just two after isolating sources in the audio. Since FrogID would also want to filter out submissions with human speech, this could serve as a helpful constraint to automatically filter unusable submissions similar to the Frog vs. Not-a-Frog classifier. It could even serve as a useful in-app filter to gently prompt the user to re-record the vocalization without speaking, to ensure higher quality submissions.

Next Steps

Based on this evaluation, FrogID is currently looking at ways to incorporate NatureLM-audio into a pre-processing pipeline that could filter user submissions to only include audio containing frogs. Non-frog submissions currently make up just under 10% of the database. By introducing pre-processing with NatureLM-audio, FrogID could save over 300 hours a year, freeing up valuable time and resources for the team. Similarly, flagging and filtering out audio clips that contain human speech could also help prioritize the most relevant submissions for the team.

The next step will be to see how the model performs with additional filters to narrow the search space for species identification. For example, adding geographic context, along with information like timestamps, GPS data, could improve NatureLM-audio’s predictions. If these tweaks are able to boost accuracy to a high enough reliability, the FrogID team could leverage NatureLM-audio directly to tackle their ultimate challenge of large-scale species detection.

“If [NatureLM-audio] could produce a checklist identifying all the frog species present in a submission and none that aren’t, that would crack what is in my mind the biggest challenge in the biodiversity species detection space. The species checklist is the holy grail!” – Dr. Jodi Rowley, Curator, Amphibian and Reptile Conservation Biology and Lead Scientist of FrogID, Australian Museum and UNSW

‍

FrogID’s initial evaluation of NatureLM-audio has sparked potential for future collaboration with Earth Species Project to collect more frog data and improve the model for Australian frogs. These improvements would help build a stronger, more comprehensive version of NatureLM-audio – ultimately providing FrogID with an even more powerful and accurate tool for their conservation efforts.

Explore NatureLM-audio Yourself!

We’re encouraged by the early results showing NatureLM-audio’s ability to generalize to species it hasn’t seen before, and even more excited by its potential to support real-world conservation projects like FrogID. NatureLM-audio is open-sourced and available on HuggingFace. We’d love to see more researchers, conservationists, and curious citizen scientists try it out on their datasets and share what they learn. We’ll also be releasing a lightweight demo of a no-code UI soon to make it even easier to explore the model – sign up here to stay in the loop with our updates.

Acknowledgements

We’d like to thank Dr. Jodi Rowley and Julia Tan from the Australian Museum for sharing their time, expertise, and enthusiasm as they explored NatureLM-audio’s potential for frog conservation. This work was made possible by a donation from CAF America, and we look forward to continuing our collaborations with the FrogID team.