Inflection-2.5: The Powerhouse LLM Rivaling GPT-4 and Gemini

Inflection AI has been making waves in the field of large language models (LLMs) with their recent unveiling of Inflection-2.5, a model that competes with the world’s leading LLMs, including OpenAI’s GPT-4 and Google’s Gemini.

Inflection AI’s rapid rise has been further fueled by a massive $1.3 billion funding round, led by industry giants such as Microsoft, NVIDIA, and renowned investors including Reid Hoffman, Bill Gates, and Eric Schmidt. This significant investment brings the total funding raised by the company to $1.525 billion.

In collaboration with partners CoreWeave and NVIDIA, Inflection AI is building the largest AI cluster in the world, comprising an unprecedented 22,000 NVIDIA H100 Tensor Core GPUs. This colossal computing power will support the training and deployment of a new generation of large-scale AI models, enabling Inflection AI to push the boundaries of what is possible in the field of personal AI.

The company’s groundbreaking work has already yielded remarkable results, with the Inflection AI cluster, currently comprising over 3,500 NVIDIA H100 Tensor Core GPUs, delivering state-of-the-art performance on the open-source benchmark MLPerf. In a joint submission with CoreWeave and NVIDIA, the cluster completed the reference training task for large language models in just 11 minutes, solidifying its position as the fastest cluster on this benchmark.

This achievement follows the unveiling of Inflection-1, Inflection AI’s in-house large language model (LLM), which has been hailed as the best model in its compute class. Outperforming industry giants such as GPT-3.5, LLaMA, Chinchilla, and PaLM-540B on a wide range of benchmarks commonly used for comparing LLMs, Inflection-1 enables users to interact with Pi, Inflection AI’s personal AI, in a simple and natural way, receiving fast, relevant, and helpful information and advice.

Inflection AI’s commitment to transparency and reproducibility is evident in the release of a technical memo detailing the evaluation and performance of Inflection-1 on various benchmarks. The memo reveals that Inflection-1 outperforms models in the same compute class, defined as models trained using at most the FLOPs (floating-point operations) of PaLM-540B.

The success of Inflection-1 and the rapid scaling of the company’s computing infrastructure, fueled by the substantial funding round, highlight Inflection AI’s unwavering dedication to delivering on its mission of creating a personal AI for everyone. With the integration of Inflection-1 into Pi, users can now experience the power of a personal AI, benefiting from its empathetic personality, usefulness, and safety standards.

Inflection-2.5

Inflection-2.5 is now available to all users of Pi, Inflection AI’s personal AI assistant, across multiple platforms, including the web (pi.ai), iOS, Android, and a new desktop app. This integration marks a significant milestone in Inflection AI’s mission to create a personal AI for everyone, combining raw capability with their signature empathetic personality and safety standards.

A Leap in Performance Inflection AI’s previous model, Inflection-1, utilized approximately 4% of the training FLOPs (floating-point operations) of GPT-4 and exhibited an average performance of around 72% compared to GPT-4 across various IQ-oriented tasks. With Inflection-2.5, Inflection AI has achieved a substantial boost in Pi’s intellectual capabilities, with a focus on coding and mathematics.

The model’s performance on key industry benchmarks demonstrates its prowess, showcasing over 94% of GPT-4’s average performance across various tasks, with a particular emphasis on excelling in STEM areas. This remarkable achievement is a testament to Inflection AI’s commitment to pushing the technological frontier while maintaining an unwavering focus on user experience and safety.

Coding and Mathematics Prowess Inflection-2.5 shines in coding and mathematics, demonstrating over a 10% improvement on Inflection-1 on BIG-Bench-Hard, a subset of challenging problems for large language models. Two coding benchmarks, MBPP+ and HumanEval+, reveal massive improvements over Inflection-1, solidifying Inflection-2.5’s position as a force to be reckoned with in the coding domain.

On the MBPP+ benchmark, Inflection-2.5 outperforms its predecessor by a significant margin, exhibiting a performance level comparable to that of GPT-4, as reported by DeepSeek Coder. Similarly, on the HumanEval+ benchmark, Inflection-2.5 demonstrates remarkable progress, surpassing the performance of Inflection-1 and approaching the level of GPT-4, as reported on the EvalPlus leaderboard.

Industry Benchmark Dominance

Inflection-2.5 stands out in industry benchmarks, showcasing substantial improvements over Inflection-1 on the MMLU benchmark and the GPQA Diamond benchmark, renowned for its expert-level difficulty. The model’s performance on these benchmarks underscores its ability to handle a wide range of tasks, from high school-level problems to professional-level challenges.

Excelling in STEM Examinations The model’s prowess extends to STEM examinations, with standout performance on the Hungarian Math exam and Physics GRE. On the Hungarian Math exam, Inflection-2.5 demonstrates its mathematical aptitude by leveraging the provided few-shot prompt and formatting, allowing for ease of reproducibility.

In the Physics GRE, a graduate entrance exam in physics, Inflection-2.5 reaches the 85th percentile of human test-takers in maj@8 (majority vote at 8), solidifying its position as a formidable contender in the realm of physics problem-solving. Furthermore, the model approaches the top score in maj@32, exhibiting its ability to tackle complex physics problems with remarkable accuracy.

Enhancing User Experience Inflection-2.5 not only upholds Pi’s signature personality and safety standards but elevates its status as a versatile and invaluable personal AI across diverse topics. From discussing current events to seeking local recommendations, studying for exams, coding, and even casual conversations, Pi powered by Inflection-2.5 promises an enriched user experience.

With Inflection-2.5’s powerful capabilities, users are engaging with Pi on a broader range of topics than ever before. The model’s ability to handle complex tasks, combined with its empathetic personality and real-time web search capabilities, ensures that users receive high-quality, up-to-date information and guidance.

User Adoption and Engagement The impact of Inflection-2.5’s integration into Pi is already evident in the user sentiment, engagement, and retention metrics. Inflection AI has witnessed a significant acceleration in organic user growth, with one million daily and six million monthly active users exchanging more than four billion messages with Pi.

On average, conversations with Pi last 33 minutes, with one in ten lasting over an hour each day. Furthermore, approximately 60% of people who interact with Pi in a given week return the following week, showcasing higher monthly stickiness than leading competitors in the field.

Technical Details and Benchmark Transparency

In line with Inflection AI’s commitment to transparency and reproducibility, the company has provided comprehensive technical results and details on the performance of Inflection-2.5 across various industry benchmarks.

For example, on the corrected version of the MT-Bench dataset, which addresses issues with incorrect reference solutions and flawed premises in the original dataset, Inflection-2.5 demonstrates performance in line with expectations based on other benchmarks.

Inflection AI has also evaluated Inflection-2.5 on HellaSwag and ARC-C, common sense and science benchmarks reported by a wide range of models, and the results showcase strong performance on these saturating benchmarks.

It is important to note that while the evaluations provided represent the model powering Pi, the user experience may vary slightly due to factors such as the impact of web retrieval (not used in the benchmarks), the structure of few-shot prompting, and other production-side differences.

Conclusion

Inflection-2.5 represents a significant leap forward in the field of large language models, rivaling the capabilities of industry leaders like GPT-4 and Gemini while utilizing only a fraction of the computing resources. With its impressive performance across a wide range of benchmarks, particularly in STEM areas, coding, and mathematics, Inflection-2.5 has positioned itself as a formidable contender in the AI landscape.

The integration of Inflection-2.5 into Pi, Inflection AI’s personal AI assistant, promises an enriched user experience, combining raw capability with empathetic personality and safety standards. As Inflection AI continues to push the boundaries of what is possible with LLMs, the AI community eagerly anticipates the next wave of innovations and breakthroughs from this trailblazing company.

Inflection AI’s visionary approach extends beyond mere model development, as the company recognizes the importance of pre-training and fine-tuning in creating high-quality, safe, and useful AI experiences. As a vertically integrated AI studio, Inflection AI handles the entire process in-house, from data ingestion and model design to high-performance infrastructure.