AI pareidolia: Can machines spot faces in inanimate objects?

In 1994, Florida jewelry designer Diana Duyser discovered what she believed to be the Virgin Mary’s image in a grilled cheese sandwich, which she preserved and later auctioned for $28,000. But how much do we really understand about pareidolia, the phenomenon of seeing faces and patterns in objects when they aren’t really there? 

A new study from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) delves into this phenomenon, introducing an extensive, human-labeled dataset of 5,000 pareidolic images, far surpassing previous collections. Using this dataset, the team discovered several surprising results about the differences between human and machine perception, and how the ability to see faces in a slice of toast might have saved your distant relatives’ lives.

“Face pareidolia has long fascinated psychologists, but it’s been largely unexplored in the computer vision community,” says Mark Hamilton, MIT PhD student in electrical engineering and computer science, CSAIL affiliate, and lead researcher on the work. “We wanted to create a resource that could help us understand how both humans and AI systems process these illusory faces.”

So what did all of these fake faces reveal? For one, AI models don’t seem to recognize pareidolic faces like we do. Surprisingly, the team found that it wasn’t until they trained algorithms to recognize animal faces that they became significantly better at detecting pareidolic faces. This unexpected connection hints at a possible evolutionary link between our ability to spot animal faces — crucial for survival — and our tendency to see faces in inanimate objects. “A result like this seems to suggest that pareidolia might not arise from human social behavior, but from something deeper: like quickly spotting a lurking tiger, or identifying which way a deer is looking so our primordial ancestors could hunt,” says Hamilton.

A row of five photos of animal faces atop five photos of inanimate objects that look like faces

Another intriguing discovery is what the researchers call the “Goldilocks Zone of Pareidolia,” a class of images where pareidolia is most likely to occur. “There’s a specific range of visual complexity where both humans and machines are most likely to perceive faces in non-face objects,” William T. Freeman, MIT professor of electrical engineering and computer science and principal investigator of the project says. “Too simple, and there’s not enough detail to form a face. Too complex, and it becomes visual noise.”

To uncover this, the team developed an equation that models how people and algorithms detect illusory faces.  When analyzing this equation, they found a clear “pareidolic peak” where the likelihood of seeing faces is highest, corresponding to images that have “just the right amount” of complexity. This predicted “Goldilocks zone” was then validated in tests with both real human subjects and AI face detection systems.

3 photos of clouds above 3 photos of a fruit tart. The left photo of each is “Too Simple” to perceive a face; the middle photo is “Just Right,” and the last photo is “Too Complex"

This new dataset, “Faces in Things,” dwarfs those of previous studies that typically used only 20-30 stimuli. This scale allowed the researchers to explore how state-of-the-art face detection algorithms behaved after fine-tuning on pareidolic faces, showing that not only could these algorithms be edited to detect these faces, but that they could also act as a silicon stand-in for our own brain, allowing the team to ask and answer questions about the origins of pareidolic face detection that are impossible to ask in humans. 

To build this dataset, the team curated approximately 20,000 candidate images from the LAION-5B dataset, which were then meticulously labeled and judged by human annotators. This process involved drawing bounding boxes around perceived faces and answering detailed questions about each face, such as the perceived emotion, age, and whether the face was accidental or intentional. “Gathering and annotating thousands of images was a monumental task,” says Hamilton. “Much of the dataset owes its existence to my mom,” a retired banker, “who spent countless hours lovingly labeling images for our analysis.”

The study also has potential applications in improving face detection systems by reducing false positives, which could have implications for fields like self-driving cars, human-computer interaction, and robotics. The dataset and models could also help areas like product design, where understanding and controlling pareidolia could create better products. “Imagine being able to automatically tweak the design of a car or a child’s toy so it looks friendlier, or ensuring a medical device doesn’t inadvertently appear threatening,” says Hamilton.

“It’s fascinating how humans instinctively interpret inanimate objects with human-like traits. For instance, when you glance at an electrical socket, you might immediately envision it singing, and you can even imagine how it would ‘move its lips.’ Algorithms, however, don’t naturally recognize these cartoonish faces in the same way we do,” says Hamilton. “This raises intriguing questions: What accounts for this difference between human perception and algorithmic interpretation? Is pareidolia beneficial or detrimental? Why don’t algorithms experience this effect as we do? These questions sparked our investigation, as this classic psychological phenomenon in humans had not been thoroughly explored in algorithms.”

As the researchers prepare to share their dataset with the scientific community, they’re already looking ahead. Future work may involve training vision-language models to understand and describe pareidolic faces, potentially leading to AI systems that can engage with visual stimuli in more human-like ways.

“This is a delightful paper! It is fun to read and it makes me think. Hamilton et al. propose a tantalizing question: Why do we see faces in things?” says Pietro Perona, the Allen E. Puckett Professor of Electrical Engineering at Caltech, who was not involved in the work. “As they point out, learning from examples, including animal faces, goes only half-way to explaining the phenomenon. I bet that thinking about this question will teach us something important about how our visual system generalizes beyond the training it receives through life.”

Hamilton and Freeman’s co-authors include Simon Stent, staff research scientist at the Toyota Research Institute; Ruth Rosenholtz, principal research scientist in the Department of Brain and Cognitive Sciences, NVIDIA research scientist, and former CSAIL member; and CSAIL affiliates postdoc Vasha DuTell, Anne Harrington MEng ’23, and Research Scientist Jennifer Corbett. Their work was supported, in part, by the National Science Foundation and the CSAIL MEnTorEd Opportunities in Research (METEOR) Fellowship, while being sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator. The MIT SuperCloud and Lincoln Laboratory Supercomputing Center provided HPC resources for the researchers’ results.

This work is being presented this week at the European Conference on Computer Vision.

Helping robots zero in on the objects that matter

Imagine having to straighten up a messy kitchen, starting with a counter littered with sauce packets. If your goal is to wipe the counter clean, you might sweep up the packets as a group. If, however, you wanted to first pick out the mustard packets before throwing the rest away, you would sort more discriminately, by sauce type. And if, among the mustards, you had a hankering for Grey Poupon, finding this specific brand would entail a more careful search.

MIT engineers have developed a method that enables robots to make similarly intuitive, task-relevant decisions.

The team’s new approach, named Clio, enables a robot to identify the parts of a scene that matter, given the tasks at hand. With Clio, a robot takes in a list of tasks described in natural language and, based on those tasks, it then determines the level of granularity required to interpret its surroundings and “remember” only the parts of a scene that are relevant.

In real experiments ranging from a cluttered cubicle to a five-story building on MIT’s campus, the team used Clio to automatically segment a scene at different levels of granularity, based on a set of tasks specified in natural-language prompts such as “move rack of magazines” and “get first aid kit.”

The team also ran Clio in real-time on a quadruped robot. As the robot explored an office building, Clio identified and mapped only those parts of the scene that related to the robot’s tasks (such as retrieving a dog toy while ignoring piles of office supplies), allowing the robot to grasp the objects of interest.

Clio is named after the Greek muse of history, for its ability to identify and remember only the elements that matter for a given task. The researchers envision that Clio would be useful in many situations and environments in which a robot would have to quickly survey and make sense of its surroundings in the context of its given task.

“Search and rescue is the motivating application for this work, but Clio can also power domestic robots and robots working on a factory floor alongside humans,” says Luca Carlone, associate professor in MIT’s Department of Aeronautics and Astronautics (AeroAstro), principal investigator in the Laboratory for Information and Decision Systems (LIDS), and director of the MIT SPARK Laboratory. “It’s really about helping the robot understand the environment and what it has to remember in order to carry out its mission.”

The team details their results in a study appearing today in the journal Robotics and Automation Letters. Carlone’s co-authors include members of the SPARK Lab: Dominic Maggio, Yun Chang, Nathan Hughes, and Lukas Schmid; and members of MIT Lincoln Laboratory: Matthew Trang, Dan Griffith, Carlyn Dougherty, and Eric Cristofalo.

Open fields

Huge advances in the fields of computer vision and natural language processing have enabled robots to identify objects in their surroundings. But until recently, robots were only able to do so in “closed-set” scenarios, where they are programmed to work in a carefully curated and controlled environment, with a finite number of objects that the robot has been pretrained to recognize.

In recent years, researchers have taken a more “open” approach to enable robots to recognize objects in more realistic settings. In the field of open-set recognition, researchers have leveraged deep-learning tools to build neural networks that can process billions of images from the internet, along with each image’s associated text (such as a friend’s Facebook picture of a dog, captioned “Meet my new puppy!”).

From millions of image-text pairs, a neural network learns from, then identifies, those segments in a scene that are characteristic of certain terms, such as a dog. A robot can then apply that neural network to spot a dog in a totally new scene.

But a challenge still remains as to how to parse a scene in a useful way that is relevant for a particular task.

“Typical methods will pick some arbitrary, fixed level of granularity for determining how to fuse segments of a scene into what you can consider as one ‘object,’” Maggio says. “However, the granularity of what you call an ‘object’ is actually related to what the robot has to do. If that granularity is fixed without considering the tasks, then the robot may end up with a map that isn’t useful for its tasks.”

Information bottleneck

With Clio, the MIT team aimed to enable robots to interpret their surroundings with a level of granularity that can be automatically tuned to the tasks at hand.

For instance, given a task of moving a stack of books to a shelf, the robot should be able to  determine that the entire stack of books is the task-relevant object. Likewise, if the task were to move only the green book from the rest of the stack, the robot should distinguish the green book as a single target object and disregard the rest of the scene — including the other books in the stack.

The team’s approach combines state-of-the-art computer vision and large language models comprising neural networks that make connections among millions of open-source images and semantic text. They also incorporate mapping tools that automatically split an image into many small segments, which can be fed into the neural network to determine if certain segments are semantically similar. The researchers then leverage an idea from classic information theory called the “information bottleneck,” which they use to compress a number of image segments in a way that picks out and stores segments that are semantically most relevant to a given task.

“For example, say there is a pile of books in the scene and my task is just to get the green book. In that case we push all this information about the scene through this bottleneck and end up with a cluster of segments that represent the green book,” Maggio explains. “All the other segments that are not relevant just get grouped in a cluster which we can simply remove. And we’re left with an object at the right granularity that is needed to support my task.”

The researchers demonstrated Clio in different real-world environments.

“What we thought would be a really no-nonsense experiment would be to run Clio in my apartment, where I didn’t do any cleaning beforehand,” Maggio says.

The team drew up a list of natural-language tasks, such as “move pile of clothes” and then applied Clio to images of Maggio’s cluttered apartment. In these cases, Clio was able to quickly segment scenes of the apartment and feed the segments through the Information Bottleneck algorithm to identify those segments that made up the pile of clothes.

They also ran Clio on Boston Dynamic’s quadruped robot, Spot. They gave the robot a list of tasks to complete, and as the robot explored and mapped the inside of an office building, Clio ran in real-time on an on-board computer mounted to Spot, to pick out segments in the mapped scenes that visually relate to the given task. The method generated an overlaying map showing just the target objects, which the robot then used to approach the identified objects and physically complete the task.

“Running Clio in real-time was a big accomplishment for the team,” Maggio says. “A lot of prior work can take several hours to run.”

Going forward, the team plans to adapt Clio to be able to handle higher-level tasks and build upon recent advances in photorealistic visual scene representations.

“We’re still giving Clio tasks that are somewhat specific, like ‘find deck of cards,’” Maggio says. “For search and rescue, you need to give it more high-level tasks, like ‘find survivors,’ or ‘get power back on.’ So, we want to get to a more human-level understanding of how to accomplish more complex tasks.”

This research was supported, in part, by the U.S. National Science Foundation, the Swiss National Science Foundation, MIT Lincoln Laboratory, the U.S. Office of Naval Research, and the U.S. Army Research Lab Distributed and Collaborative Intelligent Systems and Technology Collaborative Research Alliance.

Where flood policy helps most — and where it could do more

Flooding, including the devastation caused recently by Hurricane Helene, is responsible for $5 billion in annual damages in the U.S. That’s more than any other type of weather-related extreme event.

To address the problem, the federal government instituted a program in 1990 that helps reduce flood insurance costs in communities enacting measures to better handle flooding. If, say, a town preserves open space as a buffer against coastal flooding, or develops better stormwater management, area policy owners get discounts on their premiums. Studies show the program works well: It has reduced overall flood damage in participating communities.

However, a new study led by an MIT researcher shows that the effects of the program differ greatly from place to place. For instance, higher-population communities, which likely have more means to introduce flood defenses, benefit more than smaller communities, to the tune of about $4,000 per insured household.

“When we evaluate it, the effects of the same policy vary widely among different types of communities,” says study co-author Lidia Cano Pecharromán, a PhD candidate in MIT’s Department of Urban Studies and Planning.

Referring to climate and environmental justice concerns, she adds: “It’s important to understand not just if a policy is effective, but who is benefitting, so that we can make necessary adjustments and reach all the targets we want to reach.”

The paper, “Exposing Disparities in Flood Adaptation for Equitable Future Interventions in the USA,” is published today in Nature Communications. The authors are Cano Pecharromán and ChangHoon Hahn, an associate research scholar at Princeton University.

Able to afford help

The program in question was developed by the Federal Emergency Management Agency (FEMA), which has a division, the Flood Insurance Mitigation Administration, focusing on this issue. In 1990, FEMA initiated the National Flood Insurance Program’s Community Rating System, which incentivizes communities to enact measures that help prevent or reduce flooding.

Communities can engage in a broad set of related activities, including floodplain mapping, preservation of open spaces, stormwater management activities, creating flood warning systems, or even developing public information and participation programs. In exchange, area residents receive a discount on their flood insurance premium rates.

To conduct the study, the researchers examined 2.5 million flood insurance claims filed with FEMA since then. They also examined U.S. Census Bureau data to analyze demographic and economic data about communities, and incorporated flood risk data from the First Street Foundation.

By comparing over 1,500 communities in the FEMA program, the researchers were able to quantify its different relative effects — depending on community characteristics such as population, race, income or flood risk. For instance, higher-income communities seem better able to make more flood-control and mitigation investments, earning better FEMA ratings and, ultimately, enacting more effective measures.

“You see some positive effects for low-income communities, but as the risks go up, these disappear, while only high-income communities continue seeing these positive effects,” says Cano Pecharromán. “They are likely able to afford measures that handle a higher risk indices for flooding.”

Similarly, the researchers found, communities with higher overall levels of education fare better from the flood-insurance program, with about $2,000 more in savings per individual policy than communities with lower levels of education. One way or another, communities with more assets in the first place — size, wealth, education — are better able to deploy or hire the civic and technical expertise necessary to enact more best practices against flood damage.

And even among lower-income communities in the program, communities with less population diversity see greater effectiveness from their flood program activities, realizing a gain of about $6,000 per household compared to communities where racial and ethnic minorities are predominant.

“These are substantial effects, and we should consider these things when making decisions and reviewing if our climate adaptation policies work,” Cano Pecharromán says.

An even larger number of communities is not in the FEMA program at all. The study identified 14,729 unique U.S. communities with flood issues. Many of those are likely lacking the capacity to engage on flooding issues the way even the lower-ranked communities within the FEMA program have at least taken some action so far.

“If we are able to consider all the communities that are not in the program because they can’t afford to do the basics, we would likely see that the effects are even larger among different communities,” Cano Pecharromán says.

Getting communities started

To make the program more effective for more people, Cano Pecharromán suggests that the federal government should consider how to help communities enact flood-control and mitigation measures in the first place.

“When we set out these kinds of policies, we need to consider how certain types of communities might need help with implementation,” she says.

Methodologically, the researchers arrived at their conclusions using an advanced statistical approach that Hahn, who is an astrophysicist by training, has applied to the study of dark energy and galaxies. Instead of finding one “average treatment effect” of the FEMA program across all participating communities, they quantified the program’s impact while subdividing the set of participating set of communities according to their characteristics.

“We are able to calculate the causal effect of [the program], not as an average, which can hide these inequalities, but at every given level of the specific characteristic of communities we’re looking at, different levels of income, different levels of education, and more,” Cano Pecharromán says.

Government officials have seen Cano Pecharromán present the preliminary findings at meetings, and expressed interest in the results. Currently, she is also working on a follow-up study, which aims to pinpoint which types of local flood-mitigation programs provide the biggest benefits for local communities.

Support for the research was provided, in part, by the La Caixa Foundation, the MIT Martin Family Society of Fellows for Sustainability, and the AI Accelerator program of the Schmidt Futures Foundation.

9 Best Text to Speech APIs (September 2024)

In today’s tech-driven world, text-to-speech (TTS) technology is becoming a vital resource for businesses seeking to enhance accessibility, automate processes, and engage users more effectively. As audio content continues to grow in popularity across platforms like e-learning, customer service, and media, the demand for advanced, natural-sounding…

You.com Review: You Might Stop Using Google After Trying It

I’m a big Googler. I can easily spend hours searching for answers to random questions or exploring new topics out of curiosity. Other times, I don’t want to find myself lost in the vast sea of search results. I want fast, organized, accurate answers. Recently, I…

How to Use AI in Photoshop: 3 Mindblowing AI Tools I Love

Artificial Intelligence has revolutionized the world of digital art, and Adobe Photoshop is at the forefront of this transformation. It’s crazy how much these AI features in Photoshop reduce editing time, and I’m excited to show you my top three AI tools on the platform! Whether…

Catching Up on the WordPress 🚫 WP Engine Sitch

Many of you — perhaps most of you — have been sitting on the sidelines while WordPress and WP Engine trade legal attacks on one another. It’s been widely covered as we watch it unfold in the open; ironically, in …

Catching Up on the WordPress 🚫…