Wipro and IBM collaborate to propel enterprise AI

In a bid to accelerate the adoption of AI in the enterprise sector, Wipro has unveiled its latest offering that leverages the capabilities of IBM’s watsonx AI and data platform. The extended partnership between Wipro and IBM combines the former’s extensive industry expertise with IBM’s leading…

Microsoft is quadrupling its AI and cloud investment in Spain

Microsoft has announced plans to significantly boost its investment in AI and cloud infrastructure in Spain, with a commitment to quadruple its spending during 2024-2025 to reach $2.1 billion. This substantial increase marks the largest investment by Microsoft in Spain since its establishment in the country…

Safer skies with self-flying helicopters

Safer skies with self-flying helicopters

In late 2019, after years of studying aviation and aerospace engineering, Hector (Haofeng) Xu decided to learn to fly helicopters. At the time, he was pursuing his PhD in MIT’s Department of Aeronautics and Astronautics, so he was familiar with the risks associated with flying small aircraft. But something about being in the cockpit gave Xu a greater appreciation of those risks. After a couple of nerve-wracking experiences, he was inspired to make helicopter flight safer.

In 2021, he founded the autonomous helicopter company Rotor Technologies, Inc.

It turns out Xu’s near-misses weren’t all that unique. Although large, commercial passenger planes are extremely safe, people die every year in small, private aircraft in the U.S. Many of those fatalities occur during helicopter flights for activities like crop dusting, fighting fires, and medical evacuations.

Rotor is retrofitting existing helicopters with a suite of sensors and software to remove the pilot from some of the most dangerous flights and expand use cases for aviation more broadly.

“People don’t realize pilots are risking their lives every day in the U.S.,” Xu explains. “Pilots fly into wires, get disoriented in inclement weather, or otherwise lose control, and almost all of these accidents can be prevented with automation. We’re starting by targeting the most dangerous missions.”

Rotor’s autonomous machines are able to fly faster and longer and carry heavier payloads than battery powered drones, and by working with a reliable helicopter model that has been around for decades, the company has been able to commercialize quickly. Rotor’s autonomous aircraft are already taking to the skies around its Nashua, New Hampshire, headquarters for demo flights, and customers will be able to purchase them later this year.

“A lot of other companies are trying to build new vehicles with lots of new technologies around things like materials and power trains,” says Ben Frank ’14, Rotor’s chief commercial officer. “They’re trying to do everything. We’re really focused on autonomy. That’s what we specialize in and what we think will bring the biggest step-change to make vertical flight much safer and more accessible.”

Building a team at MIT

As an undergraduate at Cambridge University, Xu participated in the Cambridge-MIT Exchange Program (CME). His year at MIT apparently went well — after graduating Cambridge, he spent the next eight years at the Institute, first as a PhD student, then a postdoc, and finally as a research affiliate in MIT’s Department of Aeronautics and Astronautics (AeroAstro), a position he still holds today. During the CME program and his postdoc, Xu was advised by Professor Steven Barrett, who is now the head of AeroAstro. Xu says Barrett has played an important role in guiding him throughout his career.

“Rotor’s technology didn’t spin out of MIT’s labs, but MIT really shaped my vision for technology and the future of aviation,” Xu says.

Xu’s first hire was Rotor Chief Technology Officer Yiou He SM ’14, PhD ’20, whom Xu worked with during his PhD. The decision was a sign of things to come: The number of MIT affiliates at the 50-person company is now in the double digits.

“The core tech team early on was a bunch of MIT PhDs, and they’re some of the best engineers I’ve ever worked with,” Xu says. “They’re just really smart and during grad school they had built some really fantastic things at MIT. That’s probably the most critical factor to our success.”

To help get Rotor off the ground, Xu worked with the MIT Venture Mentoring Service (VMS), MIT’s Industrial Liaison Program (ILP), and the National Science Foundation’s New England Innovation Corps (I-Corps) program on campus.

A key early decision was to work with a well-known aircraft from the Robinson Helicopter Company rather than building an aircraft from scratch. Robinson already requires its helicopters to be overhauled after about 2,000 hours of flight time, and that’s when Rotor jumps in.

The core of Rotor’s solution is what’s known as a “fly by wire” system — a set of computers and motors that interact with the helicopter’s flight control features. Rotor also equips the helicopters with a suite of advanced communication tools and sensors, many of which were adapted from the autonomous vehicle industry.

“We believe in a long-term future where there are no longer pilots in the cockpit, so we’re building for this remote pilot paradigm,” Xu says. “It means we have to build robust autonomous systems on board, but it also means that we need to build communication systems between the aircraft and the ground.”

Rotor is able to leverage Robinson’s existing supply chain, and potential customers are comfortable with an aircraft they’ve worked with before — even if no one is sitting in the pilot seat. Once Rotor’s helicopters are in the air, the startup offers 24/7 monitoring of flights with a cloud-based human supervision system the company calls Cloudpilot. The company is starting with flights in remote areas to avoid risk of human injury.

“We have a very careful approach to automation, but we also retain a highly skilled human expert in the loop,” Xu says. “We get the best of the autonomous systems, which are very reliable, and the best of humans, who are really great at decision-making and dealing with unexpected scenarios.”

Autonomous helicopters take off

Using small aircraft to do things like fight fires and deliver cargo to offshore sites is not only dangerous, it’s also inefficient. There are restrictions on how long pilots can fly, and they can’t fly during adverse weather or at night.

Most autonomous options today are limited by small batteries and limited payload capacities. Rotor’s aircraft, named the R550X, can carry loads up to 1,212 pounds, travel more than 120 miles per hour, and be equipped with auxiliary fuel tanks to stay in the air for hours at a time.

Some potential customers are interested in using the aircraft to extend flying times and increase safety, but others want to use the machines for entirely new kinds of applications.

“It is a new aircraft that can do things that other aircraft couldn’t — or maybe even if technically they could, they wouldn’t do with a pilot,” Xu says. “You could also think of new scientific missions enabled by this. I hope to leave it to people’s imagination to figure out what they can do with this new tool.”

Rotor plans to sell a small handful of aircraft this year and scale production to produce 50 to 100 aircraft a year from there.

Meanwhile, in the much longer term, Xu hopes Rotor will play a role in getting him back into helicopters and, eventually, transporting humans.

“Today, our impact has a lot to do with safety, and we’re fixing some of the challenges that have stumped helicopter operators for decades,” Xu says. “But I think our biggest future impact will be changing our daily lives. I’m excited to be flying in safer, more autonomous, and more affordable vertical take-off and-landing aircraft, and I hope Rotor will be an important part of enabling that.”

A new way to let AI chatbots converse all day without crashing

A new way to let AI chatbots converse all day without crashing

When a human-AI conversation involves many rounds of continuous dialogue, the powerful large language machine-learning models that drive chatbots like ChatGPT sometimes start to collapse, causing the bots’ performance to rapidly deteriorate.

A team of researchers from MIT and elsewhere has pinpointed a surprising cause of this problem and developed a simple solution that enables a chatbot to maintain a nonstop conversation without crashing or slowing down.

Their method involves a tweak to the key-value cache (which is like a conversation memory) at the core of many large language models. In some methods, when this cache needs to hold more information than it has capacity for, the first pieces of data are bumped out. This can cause the model to fail.

By ensuring that these first few data points remain in memory, the researchers’ method allows a chatbot to keep chatting no matter how long the conversation goes.

The method, called StreamingLLM, enables a model to remain efficient even when a conversation stretches on for more than 4 million words. When compared to another method that avoids crashing by constantly recomputing part of the past conversations, StreamingLLM performed more than 22 times faster.

This could allow a chatbot to conduct long conversations throughout the workday without needing to be continually rebooted, enabling efficient AI assistants for tasks like copywriting, editing, or generating code.

“Now, with this method, we can persistently deploy these large language models. By making a chatbot that we can always chat with, and that can always respond to us based on our recent conversations, we could use these chatbots in some new applications,” says Guangxuan Xiao, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on StreamingLLM.

Xiao’s co-authors include his advisor, Song Han, an associate professor in EECS, a member of the MIT-IBM Watson AI Lab, and a distinguished scientist of NVIDIA; as well as Yuandong Tian, a research scientist at Meta AI; Beidi Chen, an assistant professor at Carnegie Mellon University; and senior author Mike Lewis, a research scientist at Meta AI. The work will be presented at the International Conference on Learning Representations.

A puzzling phenomenon

Large language models encode data, like words in a user query, into representations called tokens. Many models employ what is known as an attention mechanism that uses these tokens to generate new text.

Typically, an AI chatbot writes new text based on text it has just seen, so it stores recent tokens in memory, called a KV Cache, to use later. The attention mechanism builds a grid that includes all tokens in the cache, an “attention map” that maps out how strongly each token, or word, relates to each other token.

Understanding these relationships is one feature that enables large language models to generate human-like text.

But when the cache gets very large, the attention map can become even more massive, which slows down computation.

Also, if encoding content requires more tokens than the cache can hold, the model’s performance drops. For instance, one popular model can store 4,096 tokens, yet there are about 10,000 tokens in an academic paper.

To get around these problems, researchers employ a “sliding cache” that bumps out the oldest tokens to add new tokens. However, the model’s performance often plummets as soon as that first token is evicted, rapidly reducing the quality of newly generated words.

In this new paper, researchers realized that if they keep the first token in the sliding cache, the model will maintain its performance even when the cache size is exceeded.

But this didn’t make any sense. The first word in a novel likely has nothing to do with the last word, so why would the first word be so important for the model to generate the newest word?

In their new paper, the researchers also uncovered the cause of this phenomenon.

Attention sinks

Some models use a Softmax operation in their attention mechanism, which assigns a score to each token that represents how much it relates to each other token. The Softmax operation requires all attention scores to sum up to 1. Since most tokens aren’t strongly related, their attention scores are very low. The model dumps any remaining attention score in the first token.

The researchers call this first token an “attention sink.”

“We need an attention sink, and the model decides to use the first token as the attention sink because it is globally visible — every other token can see it. We found that we must always keep the attention sink in the cache to maintain the model dynamics,” Han says. 

In building StreamingLLM, the researchers discovered that having four attention sink tokens at the beginning of the sliding cache leads to optimal performance.

They also found that the positional encoding of each token must stay the same, even as new tokens are added and others are bumped out. If token 5 is bumped out, token 6 must stay encoded as 6, even though it is now the fifth token in the cache.

By combining these two ideas, they enabled StreamingLLM to maintain a continuous conversation while outperforming a popular method that uses recomputation.

For instance, when the cache has 256 tokens, the recomputation method takes 63 milliseconds to decode a new token, while StreamingLLM takes 31 milliseconds. However, if the cache size grows to 4,096 tokens, recomputation requires 1,411 milliseconds for a new token, while StreamingLLM needs just 65 milliseconds.

“The innovative approach of StreamingLLM, centered around the attention sink mechanism, ensures stable memory usage and performance, even when processing texts up to 4 million tokens in length,” says Yang You, a presidential young professor of computer science at the National University of Singapore, who was not involved with this work. “This capability is not just impressive; it’s transformative, enabling StreamingLLM to be applied across a wide array of AI applications. The performance and versatility of StreamingLLM mark it as a highly promising technology, poised to revolutionize how we approach AI-driven generation applications.”

Tianqi Chen, an assistant professor in the machine learning and computer science departments at Carnegie Mellon University who also was not involved with this research, agreed, saying “Streaming LLM enables the smooth extension of the conversation length of large language models. We have been using it to enable the deployment of Mistral models on iPhones with great success.”

The researchers also explored the use of attention sinks during model training by prepending several placeholder tokens in all training samples.

They found that training with attention sinks allowed a model to maintain performance with only one attention sink in its cache, rather than the four that are usually required to stabilize a pretrained model’s performance. 

But while StreamingLLM enables a model to conduct a continuous conversation, the model cannot remember words that aren’t stored in the cache. In the future, the researchers plan to target this limitation by investigating methods to retrieve tokens that have been evicted or enable the model to memorize previous conversations.

StreamingLLM has been incorporated into NVIDIA’s large language model optimization library, TensorRT-LLM.

This work is funded, in part, by the MIT-IBM Watson AI Lab, the MIT Science Hub, and the U.S. National Science Foundation.

This tiny, tamper-proof ID tag can authenticate almost anything

This tiny, tamper-proof ID tag can authenticate almost anything

A few years ago, MIT researchers invented a cryptographic ID tag that is several times smaller and significantly cheaper than the traditional radio frequency tags (RFIDs) that are often affixed to products to verify their authenticity.

This tiny tag, which offers improved security over RFIDs, utilizes terahertz waves, which are smaller and have much higher frequencies than radio waves. But this terahertz tag shared a major security vulnerability with traditional RFIDs: A counterfeiter could peel the tag off a genuine item and reattach it to a fake, and the authentication system would be none the wiser.

The researchers have now surmounted this security vulnerability by leveraging terahertz waves to develop an antitampering ID tag that still offers the benefits of being tiny, cheap, and secure.

They mix microscopic metal particles into the glue that sticks the tag to an object, and then use terahertz waves to detect the unique pattern those particles form on the item’s surface. Akin to a fingerprint, this random glue pattern is used to authenticate the item, explains Eunseok Lee, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on the antitampering tag.

“These metal particles are essentially like mirrors for terahertz waves. If I spread a bunch of mirror pieces onto a surface and then shine light on that, depending on the orientation, size, and location of those mirrors, I would get a different reflected pattern. But if you peel the chip off and reattach it, you destroy that pattern,” adds Ruonan Han, an associate professor in EECS, who leads the Terahertz Integrated Electronics Group in the Research Laboratory of Electronics.

The researchers produced a light-powered antitampering tag that is about 4 square millimeters in size. They also demonstrated a machine-learning model that helps detect tampering by identifying similar glue pattern fingerprints with more than 99 percent accuracy.

Because the terahertz tag is so cheap to produce, it could be implemented throughout a massive supply chain. And its tiny size enables the tag to attach to items too small for traditional RFIDs, such as certain medical devices.

The paper, which will be presented at the IEEE Solid State Circuits Conference, is a collaboration between Han’s group and the Energy-Efficient Circuits and Systems Group of Anantha P. Chandrakasan, MIT’s chief innovation and strategy officer, dean of the MIT School of Engineering, and the Vannevar Bush Professor of EECS. Co-authors include EECS graduate students Xibi Chen, Maitryi Ashok, and Jaeyeon Won.

Preventing tampering

This research project was partly inspired by Han’s favorite car wash. The business stuck an RFID tag onto his windshield to authenticate his car wash membership. For added security, the tag was made from fragile paper so it would be destroyed if a less-than-honest customer tried to peel it off and stick it on a different windshield.

But that is not a terribly reliable way to prevent tampering. For instance, someone could use a solution to dissolve the glue and safely remove the fragile tag.

Rather than authenticating the tag, a better security solution is to authenticate the item itself, Han says. To achieve this, the researchers targeted the glue at the interface between the tag and the item’s surface.

Their antitampering tag contains a series of minuscule slots that enable terahertz waves to pass through the tag and strike microscopic metal particles that have been mixed into the glue.

Terahertz waves are small enough to detect the particles, whereas larger radio waves would not have enough sensitivity to see them. Also, using terahertz waves with a 1-millimeter wavelength allowed the researchers to make a chip that does not need a larger, off-chip antenna.

After passing through the tag and striking the object’s surface, terahertz waves are reflected, or backscattered, to a receiver for authentication. How those waves are backscattered depends on the distribution of metal particles that reflect them.

The researchers put multiple slots onto the chip so waves can strike different points on the object’s surface, capturing more information on the random distribution of particles.

“These responses are impossible to duplicate, as long as the glue interface is destroyed by a counterfeiter,” Han says.

A vendor would take an initial reading of the antitampering tag once it was stuck onto an item, and then store those data in the cloud, using them later for verification.

AI for authentication

But when it came time to test the antitampering tag, Lee ran into a problem: It was very difficult and time-consuming to take precise enough measurements to determine whether two glue patterns are a match.

He reached out to a friend in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and together they tackled the problem using AI. They trained a machine-learning model that could compare glue patterns and calculate their similarity with more than 99 percent accuracy.

“One drawback is that we had a limited data sample for this demonstration, but we could improve the neural network in the future if a large number of these tags were deployed in a supply chain, giving us a lot more data samples,” Lee says.

The authentication system is also limited by the fact that terahertz waves suffer from high levels of loss during transmission, so the sensor can only be about 4 centimeters from the tag to get an accurate reading. This distance wouldn’t be an issue for an application like barcode scanning, but it would be too short for some potential uses, such as in an automated highway toll booth. Also, the angle between the sensor and tag needs to be less than 10 degrees or the terahertz signal will degrade too much.

They plan to address these limitations in future work, and hope to inspire other researchers to be more optimistic about what can be accomplished with terahertz waves, despite the many technical challenges, says Han.

“One thing we really want to show here is that the application of the terahertz spectrum can go well beyond broadband wireless. In this case, you can use terahertz for ID, security, and authentication. There are a lot of possibilities out there,” he adds.

This work is supported, in part, by the U.S. National Science Foundation and the Korea Foundation for Advanced Studies.

Google pledges to fix Gemini’s inaccurate and biased image generation

Google’s Gemini model has come under fire for its production of historically-inaccurate and racially-skewed images, reigniting concerns about bias in AI systems. The controversy arose as users on social media platforms flooded feeds with examples of Gemini generating pictures depicting racially-diverse Nazis, black medieval English kings,…