More Super Models is All We Need

Five major foundation models released in a single week!

Next Week in The Sequence:

Edge 371: Our series about reasoning in LLMs continues with an exploration of the Skeleton-of-Thoughts(SoT) method. We review the original SoT paper by Microsoft Research and the Dify framework for developing LLM applications.
Edge 372: We review the research behind CALM. Google Deepmind’s technique to augment LLMs with, well, other LLMs!

You can subscribe below!

📝 Editorial: More Super Models is All We Need

The release of new foundation models is nothing new in this ever-evolving generative AI market. Yet, last week felt quite overwhelming. I sat down to write this editorial on Friday morning, afraid that I might have missed some new announcements at the end of the week. This fear stemmed from experiencing one of the most impressive weeks in the history of generative AI technology. In just a few days, we witnessed the announcements of five mega generative AI models by some of the major players in the space. What is even more impressive is that each of these releases is pushing a specific line of innovation within generative AI, rather than merely copying others.

Let’s do a quick recap to put things in context.

OpenAI’s Sora: OpenAI unveiled Sora, a text-to-video generative model that can create astonishingly real videos. The key highlight here is, obviously, OpenAI’s push for innovation in the video space.
Google’s Gemini 1.5: Just a week after releasing Gemini 1.0 Ultra, Google unveiled Gemini 1.5, its new-generation multimodal model which boasts an impressive one million tokens. Gemini 1.5 includes several innovations, such as a new mixture of experts architectures and, obviously, the large-scale context window.
Cohere’s Aya: As a competitor to OpenAI, Cohere has dived into open source with the release of Aya, a new multilingual LLM supporting 101 languages. The key innovation here is Cohere’s push into open source and the wide support for different languages.
Meta’s V-JEPA: Meta AI released the code for V-JEPA, a non-generative model capable of predicting missing parts of videos using an abstract representation space. V-JEPA represents another step in Meta’s vision of enabling self-supervised learning as the core foundation of AGI.
Stability AI’s Stable Cascade: Stability AI has open-sourced Stable Cascade, a new text-to-image model. What’s new here? Well, Stable Cascade is based on the new Würstchen architecture, which enables efficiency and speed in large-scale image generation models.

How’s that for a single week? These releases are not only likely to play a significant role in the next generation of generative AI applications, but they are also championing new and unique innovations in the space.

Keep the supermodels coming!

📌 Mastering AI and ML at Production Scale at the apply() Virtual Conference

Join the next apply() virtual conference on Wednesday, April 3, for a free event that brings together the engineering community to master AI and ML in production. Since 2021, apply() has hosted more than 24,000 people with a single purpose: helping people advance their skills and expertise in AI/ML.

Experienced engineers and visionaries in the industry will share best practices and actionable guidance for transitioning from experimental models to highly scalable applications. In the past, Databricks CEO Ali Ghodsi and Min Cai from Uber shared invaluable insights, covering everything from LLMs to best practices for building scalable machine learning platforms – and there’s even more planned for April!

🔎 ML Research

V-JEPA

Meta AI published a paper and source code detailingVideo Joint Embedding Predictive Architecture( V-JEPA), another model towards their self-supervised learning vision. V-JEPA learns by predicting missing types of videos in an abstract representation space —> Read more.

More Agents is All You Need

Tencent AI Research published an interesting paper proposing a paper to enhance the performance of LLMs using a sampling and voting method. The technique seems to scale with the number of agents initiated and its performance is also proportional to the complexity of the task —> Read more.

MGIE

Researchers from Apple and UC Santa Barbara published a paper detailing MLLM-Guided Image Editing(MGIE), an instruction-based image editing model. MGIE takes expressive instructions as input and derives explicit guidance —> Read more.

MOEs and Scaling Laws

Researchers from Google DeepMind and several universities published a paper that highlights some insights about the scaling laws in mixture of experts(MoEs) architectures. The core contribution of the paper shows that MoE architectures result in more parameter scalable models —> Read more.

GraphRAG

Microsoft Research published details about GraphRAG, a technique used to build knowledge graphs in private datasets using the context knowledge of LLMs. GraphRAG improves over traditional RAG techniques when operating in complex private datasets —> Read more.

🤖 Cool AI Tech Releases

Sora

OpenAI unveiled a preview of Sora, an astonishing video generation model —> Read more.

Aya

Cohere open sourced Aya, an instruction fine-tuned, multilingual LLM with support for over 100 languages —> Read more.

Gemini 1.5

Google unveiled the next version of Gemini just a week after its prior release —> Read more.

Chat with RTX

NVIDIA launched Chat with RTX, a demo to run an LLM agent in a local computer and personalized with data stored in a Windows PC —> Read more.

Stable Cascade

Stability AI open sourced Stable Cascade, a new text-to-image model that is easier to fine-tune and optimized —> Read more.

ChtGPT Memory

OpenAI announced new memory capabilities for ChatGPT —> Read more.

LangSmith

LangChain announced the general availability of LangSmith, its tool for LLM testing and monitoring —> Read more.

🛠 Real World ML

FlyteInteractive

LinkedIn discusses details about FlyteInteractive, a tool for debugging and interacting with AI models deployed in Kubernetes pods —> Read more.

📡AI Radar

Sierra, the company started by former Salesforce Co-CEO and OpenAI board member Bret Taylor, came out of stealth mode with a $110 million raised.
NVIDIA surpassed Amazon and Alphabet to become the fourth most valuable company in the world.
GPU platform Lambda announced a $320 million round.
LangChain raised $25 million in new funding led by Sequoia Capital.
Apple is reportedly working on a new AI tool to compete with GitHub CoPilot.
Guardrails AI, a company building a platform for safeguarding foundation models, announced a $7.5 million series A.
AIX Ventures raised $200 million for its second AI-focused fund.
Zocks, an AI platform for extracting structured data from conversations, raised $5.5 million in funding.
Zylon, the makers of PrivateGPT, raised $3.2 million in a pre-seed round to bring generative AI to SMBs.
Conversational AI platform raised $30 million in new funding.
AI VC firm Founderful is raising $120 million to invest in Swiss AI startups.
Sequence analytics startups Motif Analytics announced a $5.7 million seed round.
AI marketing platform Alembic announced a $14 million Series A.
ZenDesk acquired AI-based customer service platform Klaus.
Travel startup Layla announced the acquisition of AI-powered travel planning bot Roam Around.
Vector search startup Marqo raised $12.5 million in new funding.
AI gaming platform Ultiverse raised $4 million from strategic investors.

More Super Models is All We Need

Five major foundation models released in a single week!

Next Week in The Sequence:

You can subscribe below!

📝 Editorial: More Super Models is All We Need

📌 Mastering AI and ML at Production Scale at the apply() Virtual Conference

🔎 ML Research

V-JEPA

More Agents is All You Need

MGIE

MOEs and Scaling Laws

GraphRAG

🤖 Cool AI Tech Releases

Sora

Aya

Gemini 1.5

Chat with RTX

Stable Cascade

ChtGPT Memory

LangSmith

🛠 Real World ML

FlyteInteractive

📡AI Radar

Thinking Machines: Ex-OpenAI CTO’s new AI startup

Mira Murati Launches Thinking Machines Lab: The Next Big AI Challenger

StealthGPT Review: Can It Really Fool AI Detectors?

The Next Frontier in AI: Consumer-Centric Applications for Real-World Impact

Beyond Manual Labeling: How ProVision Enhances Multimodal AI with Automated Data Synthesis

Robert Pierce, Co-Founder & Chief Science Officer at DecisionNext – Interview Series

South Korea is building the world’s largest AI data centre

How recommender systems support social learning in companies

DeepSeek mobility integration: From EVs to e-scooters

Grok 3: The next-gen ‘truth-seeking’ AI model