More Super Models is All We Need

Five major foundation models released in a single week!

More Super Models is All We Need

Created Using DALL-E

Next Week in The Sequence:

  • Edge 371: Our series about reasoning in LLMs continues with an exploration of the Skeleton-of-Thoughts(SoT) method. We review the original SoT paper by Microsoft Research and the Dify framework for developing LLM applications.

  • Edge 372: We review the research behind CALM. Google Deepmind’s technique to augment LLMs with, well, other LLMs!

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: More Super Models is All We Need

The release of new foundation models is nothing new in this ever-evolving generative AI market. Yet, last week felt quite overwhelming. I sat down to write this editorial on Friday morning, afraid that I might have missed some new announcements at the end of the week. This fear stemmed from experiencing one of the most impressive weeks in the history of generative AI technology. In just a few days, we witnessed the announcements of five mega generative AI models by some of the major players in the space. What is even more impressive is that each of these releases is pushing a specific line of innovation within generative AI, rather than merely copying others.

Let’s do a quick recap to put things in context.

  1. OpenAI’s Sora: OpenAI unveiled Sora, a text-to-video generative model that can create astonishingly real videos. The key highlight here is, obviously, OpenAI’s push for innovation in the video space.

  2. Google’s Gemini 1.5: Just a week after releasing Gemini 1.0 Ultra, Google unveiled Gemini 1.5, its new-generation multimodal model which boasts an impressive one million tokens. Gemini 1.5 includes several innovations, such as a new mixture of experts architectures and, obviously, the large-scale context window.

  3. Cohere’s Aya: As a competitor to OpenAI, Cohere has dived into open source with the release of Aya, a new multilingual LLM supporting 101 languages. The key innovation here is Cohere’s push into open source and the wide support for different languages.

  4. Meta’s V-JEPA: Meta AI released the code for V-JEPA, a non-generative model capable of predicting missing parts of videos using an abstract representation space. V-JEPA represents another step in Meta’s vision of enabling self-supervised learning as the core foundation of AGI.

  5. Stability AI’s Stable Cascade: Stability AI has open-sourced Stable Cascade, a new text-to-image model. What’s new here? Well, Stable Cascade is based on the new Würstchen architecture, which enables efficiency and speed in large-scale image generation models.

How’s that for a single week? These releases are not only likely to play a significant role in the next generation of generative AI applications, but they are also championing new and unique innovations in the space.

Keep the supermodels coming!


📌 Mastering AI and ML at Production Scale at the apply() Virtual Conference

Join the next apply() virtual conference on Wednesday, April 3, for a free event that brings together the engineering community to master AI and ML in production. Since 2021, apply() has hosted more than 24,000 people with a single purpose: helping people advance their skills and expertise in AI/ML. 

Experienced engineers and visionaries in the industry will share best practices and actionable guidance for transitioning from experimental models to highly scalable applications. In the past, Databricks CEO Ali Ghodsi and Min Cai from Uber shared invaluable insights, covering everything from LLMs to best practices for building scalable machine learning platforms – and there’s even more planned for April! 


🔎 ML Research

V-JEPA

Meta AI published a paper and source code detailingVideo Joint Embedding Predictive Architecture( V-JEPA), another model towards their self-supervised learning vision. V-JEPA learns by predicting missing types of videos in an abstract representation space —> Read more.

More Agents is All You Need

Tencent AI Research published an interesting paper proposing a paper to enhance the performance of LLMs using a sampling and voting method. The technique seems to scale with the number of agents initiated and its performance is also proportional to the complexity of the task —> Read more.

MGIE

Researchers from Apple and UC Santa Barbara published a paper detailing MLLM-Guided Image Editing(MGIE), an instruction-based image editing model. MGIE takes expressive instructions as input and derives explicit guidance —> Read more.

MOEs and Scaling Laws

Researchers from Google DeepMind and several universities published a paper that highlights some insights about the scaling laws in mixture of experts(MoEs) architectures. The core contribution of the paper shows that MoE architectures result in more parameter scalable models —> Read more.

GraphRAG

Microsoft Research published details about GraphRAG, a technique used to build knowledge graphs in private datasets using the context knowledge of LLMs. GraphRAG improves over traditional RAG techniques when operating in complex private datasets —> Read more.

🤖 Cool AI Tech Releases

Sora

OpenAI unveiled a preview of Sora, an astonishing video generation model —> Read more.

Aya

Cohere open sourced Aya, an instruction fine-tuned, multilingual LLM with support for over 100 languages —> Read more.

Gemini 1.5

Google unveiled the next version of Gemini just a week after its prior release —> Read more.

Chat with RTX

NVIDIA launched Chat with RTX, a demo to run an LLM agent in a local computer and personalized with data stored in a Windows PC —> Read more.

Stable Cascade

Stability AI open sourced Stable Cascade, a new text-to-image model that is easier to fine-tune and optimized —> Read more.

ChtGPT Memory

OpenAI announced new memory capabilities for ChatGPT —> Read more.

LangSmith

LangChain announced the general availability of LangSmith, its tool for LLM testing and monitoring —> Read more.

🛠 Real World ML

FlyteInteractive

LinkedIn discusses details about FlyteInteractive, a tool for debugging and interacting with AI models deployed in Kubernetes pods —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.