Researchers create “The Consensus Game” to elevate AI’s text comprehension and generation skills

Researchers create “The Consensus Game” to elevate AI’s text comprehension and generation skills

Imagine you and a friend are playing a game where your goal is to communicate secret messages to each other using only cryptic sentences. Your friend’s job is to guess the secret message behind your sentences. Sometimes, you give clues directly, and other times, your friend has to guess the message by asking yes-or-no questions about the clues you’ve given. The challenge is, both of you want to make sure you’re understanding each other correctly and agreeing on the secret message.

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have created a similar “game” to help improve how AI understands and generates text. The “Consensus Game” involves two parts of an AI system — one part tries to generate sentences (like giving clues), and the other part tries to understand and evaluate those sentences (like guessing the secret message).

The researchers discovered that by treating this interaction as a game, where both parts of the AI work together under specific rules to agree on the right message, they could significantly improve the AI’s ability to give correct and coherent answers to questions. They tested this new game-like approach on a variety of tasks, such as reading comprehension, solving math problems, and carrying on conversations, and found that it helped the AI perform better across the board.

Traditionally, language models (LMs) answer one of two ways: generating answers directly from the model (generative querying) or using the model to score a set of predefined answers (discriminative querying), which can lead to differing and sometimes incompatible results. With the generative approach, “Who is the President of the United States?” might yield a straightforward answer like “Joe Biden.” However, a discriminative query could incorrectly dispute this fact when evaluating the same answer, such as “Barack Obama.”

So, how do we reconcile mutually incompatible scoring procedures to achieve coherent, efficient predictions? 

“Imagine a new way to help language models understand and generate text, like a game. We’ve developed a training-free, game-theoretic method that treats the whole process as a complex game of clues and signals, where a generator tries to send the right message to a discriminator using natural language. Instead of chess pieces, they’re using words and sentences,” says MIT CSAIL PhD student Athul Jacob. “Our way to navigate this game is finding the ‘approximate equilibria,’ leading to a new decoding algorithm called ‘Equilibrium Ranking.’ It’s a pretty exciting demonstration of how bringing game-theoretic strategies into the mix can tackle some big challenges in making language models more reliable and consistent.”

When tested across many tasks, like reading comprehension, commonsense reasoning, math problem-solving, and dialogue, the team’s algorithm consistently improved how well these models performed. Using the ER algorithm with the LLaMA-7B model even outshone the results from much larger models. “Given that they are already competitive, that people have been working on it for a while, but the level of improvements we saw being able to outperform a model that’s 10 times the size was a pleasant surprise,” says Jacob. 

Game on

Diplomacy, a strategic board game set in pre-World War I Europe, where players negotiate alliances, betray friends, and conquer territories without the use of dice — relying purely on skill, strategy, and interpersonal manipulation — recently had a second coming. In November 2022, computer scientists, including Jacob,  developed “Cicero,” an AI agent that achieves human-level capabilities in the mixed-motive seven-player game, which requires the same aforementioned skills, but with natural language. The math behind this partially inspired The Consensus Game. 

While the history of AI agents long predates when OpenAI’s software entered the chat (and never looked back) in November 2022, it’s well documented that they can still cosplay as your well-meaning, yet pathological friend. 

The Consensus Game system reaches equilibrium as an agreement, ensuring accuracy and fidelity to the model’s original insights. To achieve this, the method iteratively adjusts the interactions between the generative and discriminative components until they reach a consensus on an answer that accurately reflects reality and aligns with their initial beliefs. This approach effectively bridges the gap between the two querying methods. 

In practice, implementing the Consensus Game approach to language model querying, especially for question-answering tasks, does involve significant computational challenges. For example, when using datasets like MMLU, which have thousands of questions and multiple-choice answers, the model must apply the mechanism to each query. Then, it must reach a consensus between the generative and discriminative components for every question and its possible answers. 

The system did struggle with a grade school right of passage: math word problems. It couldn’t generate wrong answers, which is a critical component of understanding the process of coming up with the right one. 

“The last few years have seen really impressive progress in both strategic decision-making and language generation from AI systems, but we’re just starting to figure out how to put the two together. Equilibrium ranking is a first step in this direction, but I think there’s a lot we’ll be able to do to scale this up to more complex problems.”   

An avenue of future work involves enhancing the base model by integrating the outputs of the current method. This is particularly promising since it can yield more factual and consistent answers across various tasks, including factuality and open-ended generation. The potential for such a method to significantly improve the base model’s performance is high, which could result in more reliable and factual outputs from ChatGPT and similar language models that people use daily. 

“Even though modern language models, such as ChatGPT and Gemini, have led to solving various tasks through chat interfaces, the statistical decoding process that generates a response from such models has remained unchanged for decades,” says Google research scientist Ahmad Beirami. “The proposal by the MIT researchers is an innovative game-theoretic framework for decoding from language models through solving the equilibrium of a consensus game. The significant performance gains reported in the research paper are promising, opening the door to a potential paradigm shift in language model decoding that may fuel a flurry of new applications.”

Jacob wrote the paper with MIT-IBM Watson Lab researcher Yikang Shen and MIT Department of Electrical Engineering and Computer Science assistant professors Gabriele Farina and Jacob Andreas, who is also a CSAIL member. They will present their work at the International Conference on Learning Representations (ICLR) this May. The research received a “best paper award” at the NeurIPS R0-FoMo Workshop in December and it will also be highlighted as a “spotlight paper” at ICLR.

Amazon Invests Billions in Anthropic as Claude 3 Outperforms GPT-4

Amazon has made yet another unprecedented investment in Anthropic, a startup that has been making significant strides with its advanced AI technologies. Anthropic, known for its foundation model and chatbot Claude, has emerged as a formidable competitor to established players like OpenAI and their widely-acclaimed ChatGPT….

The Fusion of Robotics, AI, and AR/VR: A 2024 Revolution in Manufacturing

In 2024, the manufacturing industry is currently at the doorstep of a transformational era, one marked by the seamless integration of robotics, artificial intelligence (AI), and augmented reality/virtual reality (AR/VR). This fusion is not merely a technological trend but a paradigm shift reshaping how materials are…

Accelerating Large Language Model Inference: Techniques for Efficient Deployment

Large language models (LLMs) like GPT-4, LLaMA, and PaLM are pushing the boundaries of what’s possible with natural language processing. However, deploying these massive models to production environments presents significant challenges in terms of computational requirements, memory usage, latency, and cost. As LLMs continue to grow…

TacticAI: Leveraging AI to Elevate Football Coaching and Strategy

Football, also known as soccer, stands out as one of the most widely enjoyed sports globally. Beyond the physical skills displayed on the field, it’s the strategic nuances that bring depth and excitement to the game. As former German football striker Lukas Podolsky famously remarked, “Football…

AI and cyber security, a year in review – CyberTalk

AI and cyber security, a year in review – CyberTalk

Pål (Paul) has more than 30 years of experience in the IT industry and has worked with both domestic and international clients on a local and global scale. Pål has a very broad competence base that covers everything from general security, to datacenter security, to cloud security services and development. For the past 10 years, he has worked primarily within the private sector, with a focus on both large and medium-sized companies within most verticals.

In this expert interview, Check Point security expert Pål Aaserudseter describes where we are with ChatGPT and artificial intelligence. He delves into policy, processes, and more. Don’t miss this.

In the past year, what has caught your attention regarding AI and cyber security?

Hi and thanks for having me on CyberTalk! Looking back at 2023, I think the best word to describe it is wow!

As 2023 progressed, AI experienced huge developments, with breakthroughs in chatbots, large language models and in sectors like transportation, healthcare, content creation and too many others to mention!

We might say that ChatGPT was the on-ramp into AI for most people in 2023. Obviously, it evolved, got a lot of attention in the media for various reasons and now the the makers are trying to profit from it in different ways. Competition is also on the rise, with companies like Anthropic. We’ll see a lot more happening on the AI front in 2024.

When it comes to cyber security, we have seen a massive implementation of AI on both sides of the fence. It is now easier to become a cyber criminal than ever before, as AI-enabled tools are automated, easy to use and easy to rent (as-a-service).

One example is DarkGemini. It’s a powerful GenAI chatbot, being sold on the dark web for a monthly subscription. It can create malware, build a reverse shell, and do other bad things, solely based on a text prompt, and it will surely be further developed to introduce more features that attackers can leverage.

When wielded maliciously, AI becomes a catalyst for chaos. From the creation of deep fakes to intricate social engineering schemes, – like much more convincing phishing attempts and polymorphic malware resulting in continuously mutating threat code variants – these things pose a formidable challenge to current security tools.

Consequently, the balance of power may tip in favor of attackers, as traditional defense mechanisms struggle to adapt and counter these evolving threats.

Cyber attackers leveraging AI have the capacity to automate and quickly identify vulnerabilities for exploitation. Unlike current generic attacks, AI enables attackers to tailor their assaults to specific targets and scenarios, potentially leading to a surge in personalized and precisely targeted attacks. As the scale and precision of such attacks increase, it’s likely that we’ll witness a shift in attacker behaviors and strategies.

Implementing AI-based security that learns, adapts and improves, is critical in future-proofing against unknown attacks.

What new challenges and opportunities are you seeing? What has your experience working with clients been like?

New challenges in AI and cyber security include addressing the ethical implications of AI-driven security systems, ensuring the reliability and transparency of AI algorithms, and staying ahead of evolving cyber threats.

Regulation is important, and with the EU AI Act and AI Alliance, we are taking steps forward, but as of now, the laws are still miles behind AI development.

There are also opportunities to leverage AI for proactive threat hunting, automated incident response, and predictive analytics to better protect against cyber attacks.

Working with clients has involved assisting them in understanding the capabilities and limitations of AI in cyber security (and other areas) and helping them integrate AI-powered solutions effectively into their security strategies.

Have there been any new developments around ethical guidelines/standards for the ethical use of AI within cyber security?

Yes! Efforts to establish guidelines and standards for the ethical use of AI within cyber security are ongoing and gaining traction. Organizations such as IEEE and NIST are developing frameworks to promote responsible AI practices in cyber security, focusing on transparency, fairness, accountability, and privacy.

As mentioned, the AI Alliance is comprised of technology creators, developers and adopters working together to advance safe and responsible AI.

Also, to regulate the safe use of AI, the first parts of the very important AI Act have been passed in the European Union.

As a cyber security expert, what are your perspectives around the ethical use of AI within cyber security? How can organizations ensure transparency? How can they ensure that the AI isn’t manipulated by threat actors?

My perspectives on the ethical use of AI within cyber security (and all other fields for that matter) are rooted in the principles of transparency, fairness, accountability, and privacy.

While AI holds immense potential to bolster cyber security defenses and mitigate threats, it’s crucial to ensure that its deployment aligns with ethical considerations.

Transparency is key. Organizations must be transparent about how AI algorithms are developed, trained, and utilized in cyber security operations. This transparency fosters trust among stakeholders and enables scrutiny of AI systems.

Fairness is essential to prevent discrimination or bias in AI-driven decision-making processes. It’s imperative to address algorithmic biases that may perpetuate inequalities or disadvantage certain groups. Thoughtful design, rigorous testing, and ongoing monitoring are necessary to ensure fairness in AI applications.

Note: You can compare training an AI model as to raising a child into a responsible adult. It needs guidance and fostering and needs to learn from its mistakes along the way in order to become responsible and make the right decisions in the end.

Accountability is crucial for holding individuals and organizations responsible for the actions and decisions made by AI systems. Clear lines of accountability should be established to identify who is accountable for AI-related outcomes, including any errors or failures.

Accountability encourages responsible behavior and incentivizes adherence to ethical standards.

Privacy must be protected when using AI in cyber security. Organizations should prioritize the confidentiality and integrity of sensitive data, implementing robust security measures to prevent unauthorized access or misuse. AI algorithms should be designed with privacy-enhancing techniques to minimize the risk of data breaches or privacy violations. Their design should also take things like GDPR and PII into account.

Overall, ethical considerations should guide the development, deployment, and governance of AI in cyber security (and other fields leveraging AI).

What are the implications of the new Check Point partnership with NVIDIA in relation to securing AI (cloud) infrastructure at-scale?

This shows the importance of securing such platforms, as cyber criminals will obviously try to exploit any new technology. With the immense speed of development on AI, there are going to be errors, mistakes, code and prompts that can be compromised. At Check Point, we have the solutions to secure your AI! Learn more here.

Databricks claims DBRX sets ‘a new standard’ for open-source LLMs

Databricks has announced the launch of DBRX, a powerful new open-source large language model that it claims sets a new bar for open models by outperforming established options like GPT-3.5 on industry benchmarks.  The company says the 132 billion parameter DBRX model surpasses popular open-source LLMs…