Natural language boosts LLM performance in coding, planning, and robotics

Natural language boosts LLM performance in coding, planning, and robotics

Large language models (LLMs) are becoming increasingly useful for programming and robotics tasks, but for more complicated reasoning problems, the gap between these systems and humans looms large. Without the ability to learn new concepts like humans do, these systems fail to form good abstractions — essentially, high-level representations of complex concepts that skip less-important details — and thus sputter when asked to do more sophisticated tasks.

Luckily, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have found a treasure trove of abstractions within natural language. In three papers to be presented at the International Conference on Learning Representations this month, the group shows how our everyday words are a rich source of context for language models, helping them build better overarching representations for code synthesis, AI planning, and robotic navigation and manipulation.

The three separate frameworks build libraries of abstractions for their given task: LILO (library induction from language observations) can synthesize, compress, and document code; Ada (action domain acquisition) explores sequential decision-making for artificial intelligence agents; and LGA (language-guided abstraction) helps robots better understand their environments to develop more feasible plans. Each system is a neurosymbolic method, a type of AI that blends human-like neural networks and program-like logical components.

LILO: A neurosymbolic framework that codes

Large language models can be used to quickly write solutions to small-scale coding tasks, but cannot yet architect entire software libraries like the ones written by human software engineers. To take their software development capabilities further, AI models need to refactor (cut down and combine) code into libraries of succinct, readable, and reusable programs.

Refactoring tools like the previously developed MIT-led Stitch algorithm can automatically identify abstractions, so, in a nod to the Disney movie “Lilo & Stitch,” CSAIL researchers combined these algorithmic refactoring approaches with LLMs. Their neurosymbolic method LILO uses a standard LLM to write code, then pairs it with Stitch to find abstractions that are comprehensively documented in a library.

LILO’s unique emphasis on natural language allows the system to do tasks that require human-like commonsense knowledge, such as identifying and removing all vowels from a string of code and drawing a snowflake. In both cases, the CSAIL system outperformed standalone LLMs, as well as a previous library learning algorithm from MIT called DreamCoder, indicating its ability to build a deeper understanding of the words within prompts. These encouraging results point to how LILO could assist with things like writing programs to manipulate documents like Excel spreadsheets, helping AI answer questions about visuals, and drawing 2D graphics.

“Language models prefer to work with functions that are named in natural language,” says Gabe Grand SM ’23, an MIT PhD student in electrical engineering and computer science, CSAIL affiliate, and lead author on the research. “Our work creates more straightforward abstractions for language models and assigns natural language names and documentation to each one, leading to more interpretable code for programmers and improved system performance.”

When prompted on a programming task, LILO first uses an LLM to quickly propose solutions based on data it was trained on, and then the system slowly searches more exhaustively for outside solutions. Next, Stitch efficiently identifies common structures within the code and pulls out useful abstractions. These are then automatically named and documented by LILO, resulting in simplified programs that can be used by the system to solve more complex tasks.

The MIT framework writes programs in domain-specific programming languages, like Logo, a language developed at MIT in the 1970s to teach children about programming. Scaling up automated refactoring algorithms to handle more general programming languages like Python will be a focus for future research. Still, their work represents a step forward for how language models can facilitate increasingly elaborate coding activities.

Ada: Natural language guides AI task planning

Just like in programming, AI models that automate multi-step tasks in households and command-based video games lack abstractions. Imagine you’re cooking breakfast and ask your roommate to bring a hot egg to the table — they’ll intuitively abstract their background knowledge about cooking in your kitchen into a sequence of actions. In contrast, an LLM trained on similar information will still struggle to reason about what they need to build a flexible plan.

Named after the famed mathematician Ada Lovelace, who many consider the world’s first programmer, the CSAIL-led “Ada” framework makes headway on this issue by developing libraries of useful plans for virtual kitchen chores and gaming. The method trains on potential tasks and their natural language descriptions, then a language model proposes action abstractions from this dataset. A human operator scores and filters the best plans into a library, so that the best possible actions can be implemented into hierarchical plans for different tasks.

“Traditionally, large language models have struggled with more complex tasks because of problems like reasoning about abstractions,” says Ada lead researcher Lio Wong, an MIT graduate student in brain and cognitive sciences, CSAIL affiliate, and LILO coauthor. “But we can combine the tools that software engineers and roboticists use with LLMs to solve hard problems, such as decision-making in virtual environments.”

When the researchers incorporated the widely-used large language model GPT-4 into Ada, the system completed more tasks in a kitchen simulator and Mini Minecraft than the AI decision-making baseline “Code as Policies.” Ada used the background information hidden within natural language to understand how to place chilled wine in a cabinet and craft a bed. The results indicated a staggering 59 and 89 percent task accuracy improvement, respectively.

With this success, the researchers hope to generalize their work to real-world homes, with the hopes that Ada could assist with other household tasks and aid multiple robots in a kitchen. For now, its key limitation is that it uses a generic LLM, so the CSAIL team wants to apply a more powerful, fine-tuned language model that could assist with more extensive planning. Wong and her colleagues are also considering combining Ada with a robotic manipulation framework fresh out of CSAIL: LGA (language-guided abstraction).

Language-guided abstraction: Representations for robotic tasks

Andi Peng SM ’23, an MIT graduate student in electrical engineering and computer science and CSAIL affiliate, and her coauthors designed a method to help machines interpret their surroundings more like humans, cutting out unnecessary details in a complex environment like a factory or kitchen. Just like LILO and Ada, LGA has a novel focus on how natural language leads us to those better abstractions.

In these more unstructured environments, a robot will need some common sense about what it’s tasked with, even with basic training beforehand. Ask a robot to hand you a bowl, for instance, and the machine will need a general understanding of which features are important within its surroundings. From there, it can reason about how to give you the item you want. 

In LGA’s case, humans first provide a pre-trained language model with a general task description using natural language, like “bring me my hat.” Then, the model translates this information into abstractions about the essential elements needed to perform this task. Finally, an imitation policy trained on a few demonstrations can implement these abstractions to guide a robot to grab the desired item.

Previous work required a person to take extensive notes on different manipulation tasks to pre-train a robot, which can be expensive. Remarkably, LGA guides language models to produce abstractions similar to those of a human annotator, but in less time. To illustrate this, LGA developed robotic policies to help Boston Dynamics’ Spot quadruped pick up fruits and throw drinks in a recycling bin. These experiments show how the MIT-developed method can scan the world and develop effective plans in unstructured environments, potentially guiding autonomous vehicles on the road and robots working in factories and kitchens.

“In robotics, a truth we often disregard is how much we need to refine our data to make a robot useful in the real world,” says Peng. “Beyond simply memorizing what’s in an image for training robots to perform tasks, we wanted to leverage computer vision and captioning models in conjunction with language. By producing text captions from what a robot sees, we show that language models can essentially build important world knowledge for a robot.”

The challenge for LGA is that some behaviors can’t be explained in language, making certain tasks underspecified. To expand how they represent features in an environment, Peng and her colleagues are considering incorporating multimodal visualization interfaces into their work. In the meantime, LGA provides a way for robots to gain a better feel for their surroundings when giving humans a helping hand. 

An “exciting frontier” in AI

“Library learning represents one of the most exciting frontiers in artificial intelligence, offering a path towards discovering and reasoning over compositional abstractions,” says assistant professor at the University of Wisconsin-Madison Robert Hawkins, who was not involved with the papers. Hawkins notes that previous techniques exploring this subject have been “too computationally expensive to use at scale” and have an issue with the lambdas, or keywords used to describe new functions in many languages, that they generate. “They tend to produce opaque ‘lambda salads,’ big piles of hard-to-interpret functions. These recent papers demonstrate a compelling way forward by placing large language models in an interactive loop with symbolic search, compression, and planning algorithms. This work enables the rapid acquisition of more interpretable and adaptive libraries for the task at hand.”

By building libraries of high-quality code abstractions using natural language, the three neurosymbolic methods make it easier for language models to tackle more elaborate problems and environments in the future. This deeper understanding of the precise keywords within a prompt presents a path forward in developing more human-like AI models.

MIT CSAIL members are senior authors for each paper: Joshua Tenenbaum, a professor of brain and cognitive sciences, for both LILO and Ada; Julie Shah, head of the Department of Aeronautics and Astronautics, for LGA; and Jacob Andreas, associate professor of electrical engineering and computer science, for all three. The additional MIT authors are all PhD students: Maddy Bowers and Theo X. Olausson for LILO, Jiayuan Mao and Pratyusha Sharma for Ada, and Belinda Z. Li for LGA. Muxin Liu of Harvey Mudd College was a coauthor on LILO; Zachary Siegel of Princeton University, Jaihai Feng of the University of California at Berkeley, and Noa Korneev of Microsoft were coauthors on Ada; and Ilia Sucholutsky, Theodore R. Sumers, and Thomas L. Griffiths of Princeton were coauthors on LGA. 

LILO and Ada were supported, in part, by ​​MIT Quest for Intelligence, the MIT-IBM Watson AI Lab, Intel, U.S. Air Force Office of Scientific Research, the U.S. Defense Advanced Research Projects Agency, and the U.S. Office of Naval Research, with the latter project also receiving funding from the Center for Brains, Minds and Machines. LGA received funding from the U.S. National Science Foundation, Open Philanthropy, the Natural Sciences and Engineering Research Council of Canada, and the U.S. Department of Defense.

AI, CVEs and Swiss cheese – CyberTalk

AI, CVEs and Swiss cheese – CyberTalk

By Grant Asplund, Cyber Security Evangelist, Check Point. For more than 25 years, Grant Asplund has been sharing his insights into how businesses can best protect themselves from sophisticated cyber attacks in an increasingly complex world.

Grant was Check Point first worldwide evangelist from 1998 to 2002 and returned to Check Point with the acquisition of Dome9. Grant’s wide range of cyber security experience informs his talks, as he has served in diverse roles, ranging from sales, to marketing, to business development, and to senior management for Dome 9, Blue Coat Systems, Neustar, and Altor Networks. As CEO of MetaInfo, he led its acquisition by Neustar. Grant is the host of the CISO Secrets podcast (cp.buzzsprout.com) and the Talking Cloud Podcast (talkingcloud.podbean.com) on cloud security.

EXECUTIVE SUMMARY:

AI, AI, OH!

If you’ve attended a cyber security conference in the past several months, you know the topic of artificial intelligence is in just about every vendor presentation. And I suspect, we’re going to hear a lot more about it in the coming months and years.

Our lives are certainly going to change due to AI. I’m not sure if any of us really appreciates what it will be like to have an assistant that knows everything that the internet knows.

Unfortunately, not everyone will be utilizing these AI assistants for good. Additionally, the profound impact from employing AI will be just as significant for the nefarious as for the well-intended.

Consider what’s right around the corner…

Hackers often begin their social engineering schemes by directing their AI assistants (and custom bots) to conduct reconnaissance on their target.

The first phase is to gather intelligence and information about the target. Using any and every means available, they will determine what general technology products and which security products are being used and the current versions in-use. This phase might last weeks or months.

Once gathered, the hacker will utilize AI to correlate the products and versions in-use with the known CVE’s issued for the same versions of products, and clearly identify the exploitable path(s).

200,000 known CVEs

And odds are on the hackers’ side. According to the National Vulnerability Database, there are currently over 200,000 known CVEs. Fifty percent of vulnerability exploits occur within 2-4 weeks of a patch being released, while the average time for an enterprise to respond to a critical vulnerability is 120 days.

All of this leads me to ask: When selecting a security vendor and security products, why don’t more companies ask the vendor how many CVEs have been released concerning the products being purchased?

After all, these ‘security’ products are being purchased to secure valuable business assets! Some vendors’ products have more holes than Swiss cheese!

Comprehensive, consolidated and collaborative

Of course, I’m not suggesting an organization usurp their rigorous assessment, evaluation, and selection process when choosing their security vendors and products, basing the decision solely on the number of CVEs; especially considering that today’s computing environments and overall digital footprints are vastly more complex than ever before and they continue to expand.

What I am suggesting is that now, more than ever, organizations need to step back and re-assess their overall security platform. Due to the increased complexity and ever-increasing number of point solutions, companies must consider deploying a comprehensive, consolidated, and highly collaborative security platform.

Reducing CVEs and Swiss cheese

Once your organization has identified the possible vendors who can help consolidate your security stack, be sure and check how many HIGH or CRITICAL CVE’s have been released in the last few years on the products you’re considering. And check on how long it took to fix them.

By consolidating your stack, you will reduce complexity. By eliminating the ‘Swiss cheese’ products in your security stack, you will eliminate the gaps most likely to be exploited in the future by artificial intelligence.

For information about cyber security products powered by AI, click here. To receive compelling cyber insights, groundbreaking research and emerging threat analyses each week, subscribe to the CyberTalk.org newsletter.

NBA 2K24 Removes Collector Level Reward Kobe Bryant At Last Second Sparking Fan Outcry

The late legend Kobe Bryant has served as the cover star for NBA 2K24 and, for many, a strong incentive to reach the top Collector Level in the game’s MyTeam mode. As the NBA playoffs’ first round wraps up, many fans are more disappointed than excited with the most recent NBA 2K title, as developer Visual Concepts has pulled back on a promise made prior to launch. 

In the lead-up to the game’s release date, the NBA 2K24 developer Visual Concepts released a blog post running down the features of its card-collection mode, MyTeam. In that blog post, the developer laid out several features and rewards for the then-upcoming title. It took special care to devote a section to an upcoming Collector Level reward, Kobe Bryant. 

In that post, the developer highlights just how crucial Collector Level rewards are to the overall MyTeam experience. “Collector level rewards have always been important in MyTEAM, and last year the rewards came as surprises with a hidden end goal,” the blog post from prior to launch stated. “So let’s look forward a few months and reveal that Kobe Bryant will be the top reward in the Collector Level, and this reward will be available in April, during Season 6.”

First reported by The Washington Post‘s Herb Scribner, with further reporting done by Forbes’ Paul Tassi, NBA 2K24 has changed its top Collector Level reward to not include Kobe Bryant. Instead, players can now choose from two previously released 100-overall cards. However, according to multiple community members, the most recent 100-overall card, Yao Ming, is not an option for players to choose from.

NBA 2K24 Removes Collector Level Reward Kobe Bryant At Last Second Sparking Fan Outcry

Centering a Collector Level reward around such an iconic and beloved player likely encouraged many to grind (or spend) to achieve the top Collector Level and obtain the reward. Many players have taken to social media and the NBA 2K Community Discord server to voice their displeasure. As of this writing, neither the official NBA 2K nor NBA 2K MyTeam account has posted anything regarding the situation, and the “developer-supported and community-run” NBA 2K Subreddit contains zero posts about the missing reward. However, the MyTeam Subreddit has multiple player-posted threads regarding the problem.

The only community-facing comment from the NBA 2K team that I have found was posted on the official NBA 2K Community Discord. That comment matches the comment provided to me by a 2K spokesperson when I requested comment from the publisher. You can read the entire statement provided by a 2K spokesperson below.

2K strives to deliver the very best NBA 2K24 MyTEAM experience to the community. Please note that a change to a reward has occurred. Players who achieve a top Collector Level will now receive an Option Pack for two picks out of ten previously released 100 OVR Cards. We appreciate that players have dedicated time and effort throughout the year to achieve this reward and 2K is committed to ensuring players continue to earn valuable content as their reward.

I followed up requesting additional information on why the change was made, but the spokesperson declined to comment further. Some online speculation has posited that licensing issues are to blame, but there is no confirmation or evidence to support that theory outside of 2K’s unwillingness to comment further at this time.

NBA 2K24 arrived on PlayStation 5, Xbox Series X/S, PlayStation 4, Xbox One, Switch, and PC on September 8, 2023. While the gameplay is one of the stronger elements of the title, among our reviewer’s chief complaints involved the increasingly intrusive microtransactions that permeate multiple long-term modes within the game. This controversy surrounding one of the most monetized modes does little to refute that criticism. You can read our full review here.

Amazon Reports Record Q1 2024 Earnings and Launches Amazon Q Assistant

Amazon has once again surpassed expectations with its Q1 2024 earnings report. The company posted record-breaking revenue and net income figures, highlighting its continued dominance in the tech industry. Alongside the impressive financial results, Amazon also unveiled its latest innovation, Amazon Q, their generative AI assistant…

Batman: Arkham Shadow Announced For Meta Quest 3 This Year

Batman: Arkham Shadow Announced For Meta Quest 3 This Year

The Batman: Arkham series redefined the superhero genre and changed the course of action games in the decade following Arkham Asylum’s release. While four mainline entries – Asylum, City, Origins, and Knight – delivered similar gameplay, a smaller spin-off game, 2016’s Batman: Arkham VR, let players step into the shoes of the Caped Crusader using their PlayStation VR, Oculus Rift, or HTC Vive headset. Batman: Arkham VR felt more like a tech demo than a fully fleshed-out game, but its relatively high sales showed that the appetite was there. Today, Oculus Studios announced another VR title set in the Batman: Arkham universe titled Batman: Arkham Shadow.

Though Rocksteady Studios, the developer of Asylum, City, and Knight, was behind Batman: Arkham VR, the developer that most recently released Suicide Squad: Kill the Justice League does not appear to be involved with Batman: Arkham Shadow. Instead, Camouflaj, the studio behind République and Iron Man VR, is in charge of Batman: Arkham Shadow. 

Details are scarce, but Camouflaj founder and studio head Ryan Payton penned a letter on behalf of the team, which is posted on the studio’s website. “From the start, Batman: Arkham Shadow is being crafted to be the ultimate VR game and take full advantage of the Meta Quest 3,” the letter said. “Leaning into our eight years of dedicated VR game development history has enabled us to not only create a distinctly Arkham-feeling game but done in a way that leverages the immersive magic only VR can provide.”

“Batman: Arkham Shadow is the largest Camouflaj development project to date and marks our second release as a first party member of Oculus Studios, following 2022’s critically-acclaimed release of Marvel’s Iron Man VR for Quest 2,” the letter later said.

When combined with the key art, the teaser trailer seems to hint at The Ratcatcher being the main villain in this title. Check out the very brief teaser video below.

[embedded content]

We can expect a full reveal at Summer Game Fest 2024’s livestream, set for June 7 at 2 p.m. PT. Batman: Arkham Shadow is coming exclusively to Meta Quest 3 later this year.

Nuno Loureiro named director of MIT’s Plasma Science and Fusion Center

Nuno Loureiro named director of MIT’s Plasma Science and Fusion Center

Nuno Loureiro, professor of nuclear science and engineering and of physics, has been appointed the new director of the MIT Plasma Science and Fusion Center, effective May 1.

Loureiro is taking the helm of one of MIT’s largest labs: more than 250 full-time researchers, staff members, and students work and study in seven buildings with 250,000 square feet of lab space. A theoretical physicist and fusion scientist, Loureiro joined MIT as a faculty member in 2016, and was appointed deputy director of the Plasma Science and Fusion Center (PSFC) in 2022. Loureiro succeeds Dennis Whyte, who stepped down at the end of 2023 to return to teaching and research.

Stepping into his new role as director, Loureiro says, “The PSFC has an impressive tradition of discovery and leadership in plasma and fusion science and engineering. Becoming director of the PSFC is an incredible opportunity to shape the future of these fields. We have a world-class team, and it’s an honor to be chosen as its leader.”

Loureiro’s own research ranges widely. He is recognized for advancing the understanding of multiple aspects of plasma behavior, particularly turbulence and the physics underpinning solar flares and other astronomical phenomena. In the fusion domain, his work enables the design of fusion devices that can more efficiently control and harness the energy of fusing plasmas, bringing the dream of clean, near-limitless fusion power that much closer. 

Plasma physics is foundational to advancing fusion science, a fact Loureiro has embraced and that is relevant as he considers the direction of the PSFC’s multidisciplinary research. “But plasma physics is only one aspect of our focus. Building a scientific agenda that continues and expands on the PSFC’s history of innovation in all aspects of fusion science and engineering is vital, and a key facet of that work is facilitating our researchers’ efforts to produce the breakthroughs that are necessary for the realization of fusion energy.”

As the climate crisis accelerates, fusion power continues to grow in appeal: It produces no carbon emissions, its fuel is plentiful, and dangerous “meltdowns” are impossible. The sooner that fusion power is commercially available, the greater impact it can have on reducing greenhouse gas emissions and meeting global climate goals. While technical challenges remain, “the PSFC is well poised to meet them, and continue to show leadership. We are a mission-driven lab, and our students and staff are incredibly motivated,” Loureiro comments.

“As MIT continues to lead the way toward the delivery of clean fusion power onto the grid, I have no doubt that Nuno is the right person to step into this key position at this critical time,” says Maria T. Zuber, MIT’s presidential advisor for science and technology policy. “I look forward to the steady advance of plasma physics and fusion science at MIT under Nuno’s leadership.”

Over the last decade, there have been massive leaps forward in the field of fusion energy, driven in part by innovations like high-temperature superconducting magnets developed at the PSFC. Further progress is guaranteed: Loureiro believes that “The next few years are certain to be an exciting time for us, and for fusion as a whole. It’s the dawn of a new era with burning plasma experiments” — a reference to the collaboration between the PSFC and Commonwealth Fusion Systems, a startup company spun out of the PSFC, to build SPARC, a fusion device that is slated to turn on in 2026 and produce a burning plasma that yields more energy than it consumes. “It’s going to be a watershed moment,” says Loureiro.

He continues, “In addition, we have strong connections to inertial confinement fusion experiments, including those at Lawrence Livermore National Lab, and we’re looking forward to expanding our research into stellarators, which are another kind of magnetic fusion device.” Over recent years, the PSFC has significantly increased its collaboration with industrial partners such Eni, IBM, and others. Loureiro sees great value in this: “These collaborations are mutually beneficial: they allow us to grow our research portfolio while advancing companies’ R&D efforts. It’s very dynamic and exciting.”

Loureiro’s directorship begins as the PSFC is launching key tech development projects like LIBRA, a “blanket” of molten salt that can be wrapped around fusion vessels and perform double duty as a neutron energy absorber and a breeder for tritium (the fuel for fusion). Researchers at the PSFC have also developed a way to rapidly test the durability of materials being considered for use in a fusion power plant environment, and are now creating an experiment that will utilize a powerful particle accelerator called a gyrotron to irradiate candidate materials.

Interest in fusion is at an all-time high; the demand for researchers and engineers, particularly in the nascent commercial fusion industry, is reflected by the record number of graduate students that are studying at the PSFC — more than 90 across seven affiliated MIT departments. The PSFC’s classrooms are full, and Loureiro notes a palpable sense of excitement. “Students are our greatest strength,” says Loureiro. “They come here to do world-class research but also to grow as individuals, and I want to give them a great place to do that. Supporting those experiences, making sure they can be as successful as possible is one of my top priorities.” Loureiro plans to continue teaching and advising students after his appointment begins.

MIT President Sally Kornbluth’s recently announced Climate Project is a clarion call for Loureiro: “It’s not hyperbole to say MIT is where you go to find solutions to humanity’s biggest problems,” he says. “Fusion is a hard problem, but it can be solved with resolve and ingenuity — characteristics that define MIT. Fusion energy will change the course of human history. It’s both humbling and exciting to be leading a research center that will play a key role in enabling that change.” 

Coalition of news publishers sue Microsoft and OpenAI

A coalition of major news publishers has filed a lawsuit against Microsoft and OpenAI, accusing the tech giants of unlawfully using copyrighted articles to train their generative AI models without permission or payment. First reported by The Verge, the group of eight publications owned by Alden…

Jaret Chiles, Chief Services Officer, DoiT – Interview Series

Jaret Chiles is the chief services officer (CSO) of DoiT and is responsible for all aspects of their client services organization. With 25+ years of experience across consulting and managed services, cloud adoption, technical sales, security and compliance, he is instrumental in building out a key…

Inside Microsoft’s Phi-3 Mini: A Lightweight AI Model Punching Above Its Weight

Microsoft has recently unveiled its latest lightweight language model called Phi-3 Mini, kickstarting a trio of compact AI models that are designed to deliver state-of-the-art performance while being small enough to run efficiently on devices with limited computing resources. At just 3.8 billion parameters, Phi-3 Mini…

How to Hire – and When to Fire – a Chief AI Officer

Generative AI is quickly becoming part of corporate agendas worldwide. Nevertheless, most organizations are still struggling to get their GenAI operations up and running. A recent Accenture survey revealed that only 27% of executives are in a position to scale such capabilities. Indeed, more than 70% are still at square…