Don’t Overlook China’s Open Source LLMs

A version of a Chinese LLM tops the open LLM leaderboard.

Next Week in The Sequence:

Edge 369: Our series about LLM reasoning continues with the recently published Chain-of-Code(CoC) method. We review the original CoC paper by Google DeepMind and the super popular Embedchain framework.
Edge 370: We dive the new AlphaGeometry model created by Google DeepMind that is able to solve geometry problems at the level of a math olympiad gold medalist.

You can subscribe below!

📝 Editorial: Don’t Overlook China’s Open Source LLMs

If you visit the open LLM leaderboard today, you might encounter an unfamiliar model at the top of the charts: Smaug-72B. Open-sourced by Abacus AI, this model is a fine-tuned version of another model, Qwen-72B, which Alibaba released a few months ago. The Qwen family of open-source LLMs has scored incredibly high across some of the top open-source benchmarks, showcasing the latest examples of Chinese innovation in the open-source generative AI space. While open-source LLMs are typically associated with Western models like LLaMA or Mistral, the pace of high-quality releases from China is nothing short of remarkable. Here are a few examples:

01.ai, Kai-Fu Lee’s AI startup, open-sourced its Yi family of models, which top several benchmarks on open-source leaderboards.
DeepSeek AI released DeepSeek Chat, a 67B parameter model trained on 2 trillion English and Chinese tokens, followed by models for coding and math.
Alibaba open-sourced several versions of its Qwen LLM models with impressive performance.
Tiger Research open-sourced Tigerbot-70B-chat, built on top of Llama 2.

Smaug was technically developed by an American company but as a fine-tuned version of a Chinese model. From what I can tell, most open-source Chinese LLMs share strong architectural commonalities with models like Llama or Mistral; however, there hasn’t been any major innovation from an architectural standpoint. Nonetheless, the quality is undeniable. While many skeptics of open-source generative AI regularly cited China as a major concern, they fail to recognize the contributions that Chinese research labs and startups will make to the space. It would be interesting to see how regulation plays a role in the evolution of open-source LLMs in China and Western countries. For now, don’t overlook the Chinese open-source LLMs. They are very impressive.

🎥 Watch Now: Building Plaid’s ML Fraud Detection Application

Want to learn about Plaid’s ML platform journey? In this on-demand recording, Plaid Software Engineer Renault Young shared the technical challenges they faced, how they set up the data foundations they needed to start building an ML platform, what they used to look for patterns in transaction data in real time, and more. Today, Signal is Plaid’s biggest ML application and analyzes 1000+ risk factors per ACH transaction.

The on-demand recording is now available for you to watch and share with your colleagues!

🔎 ML Research

Specialized SLMs

Apple Research published a paper evaluating small language model architectures based on inference, specialization and training budgets. The paper evaluates different architectures such as hyper-networks or mixture of experts to achieve different levels of specializations based on budget constraints —> Read more.

Chain-of-Abstraction

Meta AI Research published a paper detailing Chain of Abstraction(CoA), a method that combines reasoning and tool learning in LLMs. CoA creates abstract placeholders in reasoning chains and then fills htem with specific knowledge using tools —> Read more.

Mastering Chess Without Search

Researchers from Google DeepMind published a paper proposing a 270 million parameter transformer model that was able to play chess at a grandmaster level. The model challenges traditional approaches to chess that relied on massive game datasets and complex heuristics —> Read more.

Self-Discover

Google DeepMind published a paper introducing Self-Discover, a framework to tackle complex reasoning problems with LLMs. The framework includes reasoning modules such as critical and step-by-step thinking as well as the building blocks to compose those modules into sophisticated reasoning chains —> Read more.

AI Controller Interface

Microsoft Research released a prototype of AI Controller Interface (AICI), a framework to implement controllers that constraint the outputs of LLMs. AICI’s architecture allows the implementation of custom logic blocks the during the token decoding process and still maintaining the state of the LLM —> Read more.