When we think about breaking down communication barriers, we often focus on language translation apps or voice assistants. But for millions who use sign language, these tools have not quite bridged the gap. Sign language is not just about hand movements – it is a rich,…
How Microsoft’s AI Ecosystem Outperforms Salesforce and AWS
AI agents are autonomous systems designed to perform tasks that would typically require human involvement. By using advanced algorithms, these agents can handle a wide range of functions, from answering customer inquiries to predicting business trends. This automation not only streamlines repetitive processes but also allows…
OpenAI funds $1 million study on AI and morality at Duke University
OpenAI is awarding a $1 million grant to a Duke University research team to look at how AI could predict human moral judgments. The initiative highlights the growing focus on the intersection of technology and ethics, and raises critical questions: Can AI handle the complexities of…
Only 2.1% avoided generative AI in 2024: Find out why
This significant drop suggests various important underlying factors, like increased awareness and understanding, broader accessibility, and more….
How blockchain, IoT, and AI shape digital transformation
When devices, networks, and AI work together seamlessly, it creates a smarter, more connected ecosystem. This isn’t a distant dream; it’s a reality rapidly emerging as blockchain, IoT, and AI come together. These technologies are no longer working in isolation – they form a trio that…
US eyes AGI breakthrough in escalating China tech rivalry
The emerging US-China Artificial General Intelligence (AGI) rivalry could face a major policy transformation, as the US-China Economic and Security Review Commission (USCC) recommends a Manhattan Project-style initiative and restrictions on humanoid robots in its latest report to Congress. Released in November 2024, the Commission’s annual report…
Ordnance Survey: Navigating the role of AI and ethical considerations in geospatial technology – AI News
As we approach a new year filled with potential, the landscape of technology, particularly artificial intelligence (AI) and machine learning (ML), is on the brink of significant transformation. Manish Jethwa, CTO at Ordnance Survey (OS), the national mapping agency for Great Britain, offers an insightful glimpse…
The Race for AI Reasoning is Challenging our Imagination
New reasoning models from Google and OpenAI…
Sora AI Review: Will AI Replace Videographers For Good?
Have you ever wanted to create high-quality videos from nothing but words? In February 2024, OpenAI unveiled Sora, an AI system capable of creating photorealistic videos from text prompts that can be up to 20 seconds long. Since December 2024, the tool has been accessible to…
Ecologists find computer vision models’ blind spots in retrieving wildlife images
Try taking a picture of each of North America’s roughly 11,000 tree species, and you’ll have a mere fraction of the millions of photos within nature image datasets. These massive collections of snapshots — ranging from butterflies to humpback whales — are a great research tool for ecologists because they provide evidence of organisms’ unique behaviors, rare conditions, migration patterns, and responses to pollution and other forms of climate change.
While comprehensive, nature image datasets aren’t yet as useful as they could be. It’s time-consuming to search these databases and retrieve the images most relevant to your hypothesis. You’d be better off with an automated research assistant — or perhaps artificial intelligence systems called multimodal vision language models (VLMs). They’re trained on both text and images, making it easier for them to pinpoint finer details, like the specific trees in the background of a photo.
But just how well can VLMs assist nature researchers with image retrieval? A team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), University College London, iNaturalist, and elsewhere designed a performance test to find out. Each VLM’s task: locate and reorganize the most relevant results within the team’s “INQUIRE” dataset, composed of 5 million wildlife pictures and 250 search prompts from ecologists and other biodiversity experts.
Looking for that special frog
In these evaluations, the researchers found that larger, more advanced VLMs, which are trained on far more data, can sometimes get researchers the results they want to see. The models performed reasonably well on straightforward queries about visual content, like identifying debris on a reef, but struggled significantly with queries requiring expert knowledge, like identifying specific biological conditions or behaviors. For example, VLMs somewhat easily uncovered examples of jellyfish on the beach, but struggled with more technical prompts like “axanthism in a green frog,” a condition that limits their ability to make their skin yellow.
Their findings indicate that the models need much more domain-specific training data to process difficult queries. MIT PhD student Edward Vendrow, a CSAIL affiliate who co-led work on the dataset in a new paper, believes that by familiarizing with more informative data, the VLMs could one day be great research assistants. “We want to build retrieval systems that find the exact results scientists seek when monitoring biodiversity and analyzing climate change,” says Vendrow. “Multimodal models don’t quite understand more complex scientific language yet, but we believe that INQUIRE will be an important benchmark for tracking how they improve in comprehending scientific terminology and ultimately helping researchers automatically find the exact images they need.”
The team’s experiments illustrated that larger models tended to be more effective for both simpler and more intricate searches due to their expansive training data. They first used the INQUIRE dataset to test if VLMs could narrow a pool of 5 million images to the top 100 most-relevant results (also known as “ranking”). For straightforward search queries like “a reef with manmade structures and debris,” relatively large models like “SigLIP” found matching images, while smaller-sized CLIP models struggled. According to Vendrow, larger VLMs are “only starting to be useful” at ranking tougher queries.
Vendrow and his colleagues also evaluated how well multimodal models could re-rank those 100 results, reorganizing which images were most pertinent to a search. In these tests, even huge LLMs trained on more curated data, like GPT-4o, struggled: Its precision score was only 59.6 percent, the highest score achieved by any model.
The researchers presented these results at the Conference on Neural Information Processing Systems (NeurIPS) earlier this month.
Inquiring for INQUIRE
The INQUIRE dataset includes search queries based on discussions with ecologists, biologists, oceanographers, and other experts about the types of images they’d look for, including animals’ unique physical conditions and behaviors. A team of annotators then spent 180 hours searching the iNaturalist dataset with these prompts, carefully combing through roughly 200,000 results to label 33,000 matches that fit the prompts.
For instance, the annotators used queries like “a hermit crab using plastic waste as its shell” and “a California condor tagged with a green ‘26’” to identify the subsets of the larger image dataset that depict these specific, rare events.
Then, the researchers used the same search queries to see how well VLMs could retrieve iNaturalist images. The annotators’ labels revealed when the models struggled to understand scientists’ keywords, as their results included images previously tagged as irrelevant to the search. For example, VLMs’ results for “redwood trees with fire scars” sometimes included images of trees without any markings.
“This is careful curation of data, with a focus on capturing real examples of scientific inquiries across research areas in ecology and environmental science,” says Sara Beery, the Homer A. Burnell Career Development Assistant Professor at MIT, CSAIL principal investigator, and co-senior author of the work. “It’s proved vital to expanding our understanding of the current capabilities of VLMs in these potentially impactful scientific settings. It has also outlined gaps in current research that we can now work to address, particularly for complex compositional queries, technical terminology, and the fine-grained, subtle differences that delineate categories of interest for our collaborators.”
“Our findings imply that some vision models are already precise enough to aid wildlife scientists with retrieving some images, but many tasks are still too difficult for even the largest, best-performing models,” says Vendrow. “Although INQUIRE is focused on ecology and biodiversity monitoring, the wide variety of its queries means that VLMs that perform well on INQUIRE are likely to excel at analyzing large image collections in other observation-intensive fields.”
Inquiring minds want to see
Taking their project further, the researchers are working with iNaturalist to develop a query system to better help scientists and other curious minds find the images they actually want to see. Their working demo allows users to filter searches by species, enabling quicker discovery of relevant results like, say, the diverse eye colors of cats. Vendrow and co-lead author Omiros Pantazis, who recently received his PhD from University College London, also aim to improve the re-ranking system by augmenting current models to provide better results.
University of Pittsburgh Associate Professor Justin Kitzes highlights INQUIRE’s ability to uncover secondary data. “Biodiversity datasets are rapidly becoming too large for any individual scientist to review,” says Kitzes, who wasn’t involved in the research. “This paper draws attention to a difficult and unsolved problem, which is how to effectively search through such data with questions that go beyond simply ‘who is here’ to ask instead about individual characteristics, behavior, and species interactions. Being able to efficiently and accurately uncover these more complex phenomena in biodiversity image data will be critical to fundamental science and real-world impacts in ecology and conservation.”
Vendrow, Pantazis, and Beery wrote the paper with iNaturalist software engineer Alexander Shepard, University College London professors Gabriel Brostow and Kate Jones, University of Edinburgh associate professor and co-senior author Oisin Mac Aodha, and University of Massachusetts at Amherst Assistant Professor Grant Van Horn, who served as co-senior author. Their work was supported, in part, by the Generative AI Laboratory at the University of Edinburgh, the U.S. National Science Foundation/Natural Sciences and Engineering Research Council of Canada Global Center on AI and Biodiversity Change, a Royal Society Research Grant, and the Biome Health Project funded by the World Wildlife Fund United Kingdom.