New algorithm unlocks high-resolution insights for computer vision

New algorithm unlocks high-resolution insights for computer vision

Imagine yourself glancing at a busy street for a few moments, then trying to sketch the scene you saw from memory. Most people could draw the rough positions of the major objects like cars, people, and crosswalks, but almost no one can draw every detail with pixel-perfect accuracy. The same is true for most modern computer vision algorithms: They are fantastic at capturing high-level details of a scene, but they lose fine-grained details as they process information.

Now, MIT researchers have created a system called “FeatUp” that lets algorithms capture all of the high- and low-level details of a scene at the same time — almost like Lasik eye surgery for computer vision.

When computers learn to “see” from looking at images and videos, they build up “ideas” of what’s in a scene through something called “features.” To create these features, deep networks and visual foundation models break down images into a grid of tiny squares and process these squares as a group to determine what’s going on in a photo. Each tiny square is usually made up of anywhere from 16 to 32 pixels, so the resolution of these algorithms is dramatically smaller than the images they work with. In trying to summarize and understand photos, algorithms lose a ton of pixel clarity. 

The FeatUp algorithm can stop this loss of information and boost the resolution of any deep network without compromising on speed or quality. This allows researchers to quickly and easily improve the resolution of any new or existing algorithm. For example, imagine trying to interpret the predictions of a lung cancer detection algorithm with the goal of localizing the tumor. Applying FeatUp before interpreting the algorithm using a method like class activation maps (CAM) can yield a dramatically more detailed (16-32x) view of where the tumor might be located according to the model. 

FeatUp not only helps practitioners understand their models, but also can improve a panoply of different tasks like object detection, semantic segmentation (assigning labels to pixels in an image with object labels), and depth estimation. It achieves this by providing more accurate, high-resolution features, which are crucial for building vision applications ranging from autonomous driving to medical imaging.

“The essence of all computer vision lies in these deep, intelligent features that emerge from the depths of deep learning architectures. The big challenge of modern algorithms is that they reduce large images to  very small grids of ‘smart’ features, gaining intelligent insights but losing the finer details,” says Mark Hamilton, an MIT PhD student in electrical engineering and computer science, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) affiliate, and a co-lead author on a paper about the project. “FeatUp helps enable the best of both worlds: highly intelligent representations with the original image’s resolution. These high-resolution features significantly boost performance across a spectrum of computer vision tasks, from enhancing object detection and improving depth prediction to providing a deeper understanding of your network’s decision-making process through high-resolution analysis.” 

Resolution renaissance 

As these large AI models become more and more prevalent, there’s an increasing need to explain what they’re doing, what they’re looking at, and what they’re thinking. 

But how exactly can FeatUp discover these fine-grained details? Curiously, the secret lies in wiggling and jiggling images. 

In particular, FeatUp applies minor adjustments (like moving the image a few pixels to the left or right) and watches how an algorithm responds to these slight movements of the image. This results in hundreds of deep-feature maps that are all slightly different, which can be combined into a single crisp, high-resolution, set of deep features. “We imagine that some high-resolution features exist, and that when we wiggle them and blur them, they will match all of the original, lower-resolution features from the wiggled images. Our goal is to learn how to refine the low-resolution features into high-resolution features using this ‘game’ that lets us know how well we are doing,” says Hamilton. This methodology is analogous to how algorithms can create a 3D model from multiple 2D images by ensuring that the predicted 3D object matches all of the 2D photos used to create it. In FeatUp’s case, they predict a high-resolution feature map that’s consistent with all of the low-resolution feature maps formed by jittering the original image.

The team notes that standard tools available in PyTorch were insufficient for their needs, and introduced a new type of deep network layer in their quest for a speedy and efficient solution. Their custom layer, a special joint bilateral upsampling operation, was over 100 times more efficient than a naive implementation in PyTorch. The team also showed this new layer could improve a wide variety of different algorithms including semantic segmentation and depth prediction. This layer improved the network’s ability to process and understand high-resolution details, giving any algorithm that used it a substantial performance boost. 

“Another application is something called small object retrieval, where our algorithm allows for precise localization of objects. For example, even in cluttered road scenes algorithms enriched with FeatUp can see tiny objects like traffic cones, reflectors, lights, and potholes where their low-resolution cousins fail. This demonstrates its capability to enhance coarse features into finely detailed signals,” says Stephanie Fu ’22, MNG ’23, a PhD student at the University of California at Berkeley and another co-lead author on the new FeatUp paper. “This is especially critical for time-sensitive tasks, like pinpointing a traffic sign on a cluttered expressway in a driverless car. This can not only improve the accuracy of such tasks by turning broad guesses into exact localizations, but might also make these systems more reliable, interpretable, and trustworthy.”

What next?

Regarding future aspirations, the team emphasizes FeatUp’s potential widespread adoption within the research community and beyond, akin to data augmentation practices. “The goal is to make this method a fundamental tool in deep learning, enriching models to perceive the world in greater detail without the computational inefficiency of traditional high-resolution processing,” says Fu.

“FeatUp represents a wonderful advance towards making visual representations really useful, by producing them at full image resolutions,” says Cornell University computer science professor Noah Snavely, who was not involved in the research. “Learned visual representations have become really good in the last few years, but they are almost always produced at very low resolution — you might put in a nice full-resolution photo, and get back a tiny, postage stamp-sized grid of features. That’s a problem if you want to use those features in applications that produce full-resolution outputs. FeatUp solves this problem in a creative way by combining classic ideas in super-resolution with modern learning approaches, leading to beautiful, high-resolution feature maps.”

“We hope this simple idea can have broad application. It provides high-resolution versions of image analytics that we’d thought before could only be low-resolution,” says senior author William T. Freeman, an MIT professor of electrical engineering and computer science professor and CSAIL member.

Lead authors Fu and Hamilton are accompanied by MIT PhD students Laura Brandt SM ’21 and Axel Feldmann SM ’21, as well as Zhoutong Zhang SM ’21, PhD ’22, all current or former affiliates of MIT CSAIL. Their research is supported, in part, by a National Science Foundation Graduate Research Fellowship, by the National Science Foundation and Office of the Director of National Intelligence, by the U.S. Air Force Research Laboratory, and by the U.S. Air Force Artificial Intelligence Accelerator. The group will present their work in May at the International Conference on Learning Representations.

Five MIT faculty members take on Cancer Grand Challenges

Five MIT faculty members take on Cancer Grand Challenges

Cancer Grand Challenges recently announced five winning teams for 2024, which included five researchers from MIT: Michael Birnbaum, Regina Barzilay, Brandon DeKosky, Seychelle Vos, and Ömer Yilmaz. Each team is made up of interdisciplinary cancer researchers from across the globe and will be awarded $25 million over five years. 

Birnbaum, an associate professor in the Department of Biological Engineering, leads Team MATCHMAKERS and is joined by co-investigators Barzilay, the School of Engineering Distinguished Professor for AI and Health in the Department of Electrical Engineering and Computer Science and the AI faculty lead at the MIT Abdul Latif Jameel Clinic for Machine Learning in Health; and DeKosky, Phillip and Susan Ragon Career Development Professor of Chemical Engineering. All three are also affiliates of the Koch Institute for Integrative Cancer Research At MIT.

Team MATCHMAKERS will take advantage of recent advances in artificial intelligence to develop tools for personalized immunotherapies for cancer patients. Cancer immunotherapies, which recruit the patient’s own immune system against the disease, have transformed treatment for some cancers, but not for all types and not for all patients. 

T cells are one target for immunotherapies because of their central role in the immune response. These immune cells use receptors on their surface to recognize protein fragments called antigens on cancer cells. Once T cells attach to cancer antigens, they mark them for destruction by the immune system. However, T cell receptors are exceptionally diverse within one person’s immune system and from person to person, making it difficult to predict how any one cancer patient will respond to an immunotherapy.  

Team MATCHMAKERS will collect data on T cell receptors and the different antigens they target and build computer models to predict antigen recognition by different T cell receptors. The team’s overarching goal is to develop tools for predicting T cell recognition with simple clinical lab tests and designing antigen-specific immunotherapies. “If successful, what we learn on our team could help transform prediction of T cell receptor recognition from something that is only possible in a few sophisticated laboratories in the world, for a few people at a time, into a routine process,” says Birnbaum. 

“The MATCHMAKERS project draws on MIT’s long tradition of developing cutting-edge artificial intelligence tools for the benefit of society,” comments Ryan Schoenfeld, CEO of The Mark Foundation for Cancer Research. “Their approach to optimizing immunotherapy for cancer and many other diseases is exemplary of the type of interdisciplinary research The Mark Foundation prioritizes supporting.” In addition to The Mark Foundation, the MATCHMAKERS team is funded by Cancer Research UK and the U.S. National Cancer Institute.

Vos, the Robert A. Swanson (1969) Career Development Professor of Life Sciences and HHMI Freeman Hrabowksi Scholar in the Department of Biology, will be a co-investigator on Team KOODAC. The KOODAC team will develop new treatments for solid tumors in children, using protein degradation strategies to target previously “undruggable” drivers of cancers. KOODAC is funded by Cancer Research UK, France’s Institut National Du Cancer, and KiKa (Children Cancer Free Foundation) through Cancer Grand Challenges. 

As a co-investigator on team PROSPECT, Yilmaz, who is also a Koch Institute affiliate, will help address early-onset colorectal cancers, an emerging global problem among individuals younger than 50 years. The team seeks to elucidate pathways, risk factors, and molecules involved in the disease’s development. Team PROSPECT is supported by Cancer Research UK, the U.S. National Cancer Institute, the Bowelbabe Fund for Cancer Research UK, and France’s Institut National Du Cancer through Cancer Grand Challenges.  

Unlocking the quantum future

Unlocking the quantum future

Quantum computing is the next frontier for faster and more powerful computing technologies. It has the potential to better optimize routes for shipping and delivery, speed up battery development for electric vehicles, and more accurately predict trends in financial markets. But to unlock the quantum future, scientists and engineers need to solve outstanding technical challenges while continuing to explore new applications.

One place where they’re working towards this future is the MIT Interdisciplinary Quantum Hackathon, or iQuHACK for short (pronounced “i-quack,” like a duck). Each year, a community of quhackers (quantum hackers) gathers at iQuHACK to work on quantum computing projects using real quantum computers and simulators. This year, the hackathon was held both in-person at MIT and online over three days in February.

Quhackers worked in teams to advance the capability of quantum computers and to investigate promising applications. Collectively, they tackled a wide range of projects, such as running a quantum-powered dating service, building an organ donor matching app, and breaking into quantum vaults. While working, quhackers could consult with scientists and engineers in attendance from sponsoring companies. Many sponsors also received feedback and ideas from quhackers to help improve their quantum platforms.

But organizing iQuHACK 2024 was no easy feat. Co-chairs Alessandro Buzzi and Daniela Zaidenberg led a committee of nine members to hold the largest iQuHACK yet. “It wouldn’t have been possible without them,” Buzzi said. The hackathon hosted 260 in-person quhackers and 1,000 remote quhackers, representing 77 countries in total. More than 20 scientists and engineers from sponsoring companies also attended in person as mentors for quhackers.

Each team of quhackers tackled one of 10 challenges posed by the hackathon’s eight major sponsoring companies. Some challenges asked quhackers to improve computing performance, such as by making quantum algorithms faster and more accurate. Other challenges asked quhackers to explore applying quantum computing to other fields, such as finance and machine learning. The sponsors worked with the iQuHACK committee to craft creative challenges with industry relevance and societal impact. “We wanted people to be able to address an interesting challenge [that has] applications in the real world,” says Zaidenberg.

One team of quhackers looked for potential quantum applications and found one close to home: dating. A team member, Liam Kronman, had previously built dating apps but disliked that matching algorithms for normal classical computers “require [an overly] strict setup.” With these classical algorithms, people must be split into two groups — for example, men and women — and matches can only be made between these groups. But with quantum computers, matching algorithms are more flexible and can consider all possible combinations, enabling the inclusion of multiple genders and gender preferences. 

Kronman and his team members leveraged these quantum algorithms to build a quantum-powered dating platform called MITqute (pronounced “meet cute”). To date, the platform has matched at least 240 people from the iQuHACK and MIT undergrad communities. In a follow-up survey, 13 out of 41 respondents reported having talked with their match, with at least two pairs setting up dates. “I really lucked out with this one,” one respondent wrote. 

Another team of quhackers also based their project on quantum matching algorithms but instead leveraged the algorithms’ power for medical care. The team built a mobile app that matches organ donors to patients, earning them the hackathon’s top social impact award. 

But they almost didn’t go through with their project. “At one point, we were considering scrapping the whole thing because we thought we couldn’t implement the algorithm,” says Alma Alex, one of the developers. After talking with their hackathon mentor for advice, though, the team learned that another group was working on a similar type of project — incidentally, the MITqute team. Knowing that others were tackling the same problem inspired them to persevere.

A sense of community also helped to motivate other quhackers. For one of the challenges, quhackers were tasked with hacking into 13 virtual quantum vaults. Teams could see each other’s progress on each vault in real time on a leaderboard, and this knowledge informed their strategies. When the first vault was successfully hacked by a team, progress from many other teams spiked on that vault and slowed down on others, says Daiwei Zhu, a quantum applications scientist at IonQ and one of the challenge’s two architects.

The vault challenge may appear to be just a fun series of puzzles, but the solutions can be used in quantum computers to improve their efficiency and accuracy. To hack into a vault, quhackers had to first figure out its secret key — an unknown quantum state — using a maximum of 20 probing tests. Then, they had to change the key’s state to a target state. These types of characterizations and modifications of quantum states are “fundamental” for quantum computers to work, says Jason Iaconis, a quantum applications engineer at IonQ and the challenge’s other architect. 

But the best way to characterize and modify states is not yet clear. “Some of the [vaults] we [didn’t] even know how to solve ourselves,” Zhu says. At the end of the hackathon, six vaults had at least one team mostly hack into them. (In the quantum world where gray areas exist, it’s possible to partly hack into a vault.)

The community of scientists and engineers formed at iQuHACK persists beyond the weekend, and many members continue to grow the community outside the hackathon. Inspired quhackers have gone on to start their own quantum computing clubs at their universities. A few years ago, a group of undergraduate quhackers from different universities formed a Quantum Coalition that now hosts their own quantum hackathons. “It’s crazy to see how the hackathon itself spreads and how many people start their own initiatives,” co-chair Zaidenberg says. 

The three-day hackathon opened with a keynote from MIT Professor Will Oliver, which included an overview of basic quantum computing concepts, current challenges, and computing technologies. Following that were industry talks and a panel of six industry and academic quantum experts, including MIT Professor Peter Shor, who is known for developing one of the most famous quantum algorithms. The panelists discussed current challenges, future applications, the importance of collaboration, and the need for ample testing.

Later, sponsors held technical workshops where quhackers could learn the nitty-gritty details of programming on specific quantum platforms. Day one closed out with a talk by research scientist Xinghui Yin on the role of quantum technology at LIGO, the Laser Interferometer Gravitational-Wave Observatory that first detected gravitational waves. The next day, the hackathon’s challenges were announced at 10 a.m., and hacking kicked off at the MIT InnovationHQ. In the afternoon, attendees could also tour MIT quantum computing labs.

Hacking continued overnight at the MIT Museum and ended back at MIT iHQ at 10 a.m. on the final day. Quhackers then presented their projects to panels of judges. Afterward, industry speakers gave lightning talks about each of their company’s latest quantum technologies and future directions. The hackathon ended with a closing ceremony, where sponsors announced the awards for each of the 10 challenges. 

The hackathon was captured in a three-part video by Albert Figurt, a resident artist at MIT. Figurt shot and edited the footage in parallel with the hackathon. Each part represented one day of the hackathon and was released on the subsequent day.

Throughout the weekend, quhackers and sponsors consistently praised the hackathon’s execution and atmosphere. “That was amazing … never felt so much better, one of the best hackathons I did from over 30 hackathons I attended,” Abdullah Kazi, a quhacker, wrote on the iQuHACK Slack.

Ultimately, “[we wanted to] help people to meet each other,” co-chair Buzzi says. “The impact [of iQuHACK] is scientific in some way, but it’s very human at the most important level.”

Elon Musk’s xAI open-sources Grok

Elon Musk’s startup xAI has made its large language model Grok available as open source software. The 314 billion parameter model can now be freely accessed, modified, and distributed by anyone under an Apache 2.0 license. The release fulfils Musk’s promise to open source Grok in…