My Five Favorite AI Papers of 2023

LLM interpretability, small language models, autonomous agents, API fine-tuning, discovering new algorithms

Created Using DALL-E

Next Week in The Sequence:

  • Edge 357: Our series about LLM reasoning explores chain-of-thought (CoT) prompting including its original paper. We also explore the ThinkGPT framework.

  • Edge 358: Dives into the AGENTS paper which proposes a framework for building semi-autonomous agents.

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: My Five Favorite AI Papers of 2023

Today marks the final issue of 2023, and I want to start by expressing my gratitude for your support. The Sequence has grown organically to over 165,000 subscribers this year. Thank you all for your continued support.

Today’s edition will be shorter, as there isn’t much content to cover this week. I’d like to highlight five papers that significantly impacted me in 2023. These might not be the papers you’ll find receiving awards at top conferences, and I’m sure there are many equally important papers that other experts could mention. My focus is on papers that shifted my perspective on different areas of AI. A quick side note: in 2023, I incubated and raised substantial seed rounds for two different companies in the generative AI space—one in autonomous agents and one in open-source generative AI infrastructure. Both are currently in stealth mode, but I hope to share more details soon. I mention this because the concepts revealed in these papers have influenced some components of these platforms. I’ve kept the list short to be selective.

So here we go:

  1. “Decomposing Language Models Into Understandable Components by Anthropic, because it showed me that interpretability in LLMs might be a solvable engineering problem, scaling similarly to LLMs themselves. The possibility of interpreting how LLMs form concepts and arrive at answers is truly fascinating.

  2. “Textbooks are All You Need” by Microsoft Research, which helped me understand that small LLMs trained on high-quality data can outperform much larger models. This paper inspired the Phi model and its subsequent Phi 1.5 and Phi 2 releases.

  3. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation also by Microsoft Research, introduced new ideas about multi-agent communication patterns. This could be one of the most significant recent papers on autonomous agents.

  4. “Gorilla: Large Language Model Connected with Massive APIs” from Berkeley University, which challenged traditional RAG concepts and demonstrated that fine-tuning models on API datasets can yield incredible results. Other papers in this area such as ToolLLM were quite influential but I wanted to pick one.

  5. “FunSearch: Making new discoveries in mathematical sciences using Large Language Models” by Google DeepMind, as I am fascinated by applying AI to science, and discovering new computer science algorithms is exceptionally complex.

There are many other papers I could cite, as 2023 was an incredible year for AI research, but the above five were particularly influential in shaping my thinking about AI problems.

The Sequence will start strong next year, continuing our series on LLM reasoning. I hope you have had wonderful holidays, and I wish you a blessed new year.

Thank you.

🔎 ML Research

The Gemini Paper

Google DeepMind finally published the paper behind their Gemini models. The paper includes detail about the architecture and training processes for Gemini Ultra, Pro and Nano including the optimizaton for different use cases —> Read more.

Mini-GPTs

AI researchers from MIT published a paper detailing a technique to create Mini-GPTs using. The technique uses architectures such as Microsoft Phi and prunes some components while preserving the key functionality —> Read more.

Multimodal Models and In-Context Learning

Researchers from the Beijing Academy of Artificial Intelligence pubished a paper introduing Emu2, a 37 billion parameter model capable of complex reasoning via in-context learning. The model seems to match state of the art performance in several multimodal, few-shot, reasoning tasks —> Read more.

Vision LLMs and Reinforcement Learning

Google DeepMind published a paper introducing a very interesting technique that uses vision-language models(VLMs) as a source of rewards for reinforcement learning(RL) agents. The method shows how VLMs can produce rewards for RL agents in visual tasks faster and at a much larger scale than traditional methods —> Read more.

🤖 Cool AI Tech Releases

Pika

Text-to-Video platform Pika released its firt version —> Read more.

SOLAR-10.7B

Korean AI company Upstage open sourced SOLAR-10.7B, a 10.7 billion parameter LLM with impressive performance —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.