Google Goes Small and Open Source with Gemma

Gemma is based on the core architecture powering Gemini.

Created Using DALL-E

Next Week in The Sequence:

  • Edge 373: Our series about reasoning in LLMs continues with the ReWOO method including a review of its original paper. We also provide an introduction to the LLMFlow framework.

  • Edge 374: We deep dive into the things we know about the architecture powering Sora, OpenAI’s fascinating text-to-video model.

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Google Goes Small and Open Source with Gemma

Generative AI is transforming everything, but it’s hard to lead the revolution with 70B parameters at a time! LLMs feel magical until you spend weeks trying to fine-tune a massively large model. For companies to build their own generative AI models, the core architecture of these models needs to become smaller. Last year, we saw iterations of this concept with models such as Microsoft Phi 2, a 2 billion parameter model that was able to outperform much larger models in math and coding. Microsoft even coined a term for these types of models: small language models (SLMs).

This week, Google jumped on the SLM train by releasing Gemma, a family of open-source SLMs based on the same core architecture that powers its marquee Gemini model. The release includes the pretrained and instruction-tuned versions of Gemma 2B and 7B. Additionally, the Gemma release provides native integration with Hugging Face, Kaggle, and Google Collab notebooks, as well as major frameworks such as TensorFlow, PyTorch, and JAX. Gemma was evaluated across several industry-leading benchmarks, surpassing models considerably larger in size.

The Gemma release represents an interesting strategic move by Google. The tech giant is not only taking a position in the SLM space but also championing open-source efforts. This contrasts with the closed nature of its Gemini release. Open source is deeply ingrained in Google’s culture, and hopefully, we can see them push more generative AI efforts in this area.

While massively large foundation models continue to achieve new milestones, the SLM revolution seems inevitable. Now, both Microsoft and Google have taken a position. The SLMs are coming!


📍 Announcing GenAI Productionize 2024 – the one and only event on productionizing enterprise GenAI!

We invite you to see how LlamaIndexCoinbaseLinkedInComcastProctor & GambleRobloxDatabricksJPMorgan ChaseFidelityChegg and others get their GenAI apps into production, including practical strategies for governance, evaluation, and monitoring.

Register for GenAI Productionize 2024 to learn:

  • How organizations have successfully adopted enterprise GenAI;

  • Practical hands-on architecture and tool insights from leading GenAI builders;

  • The emerging enterprise GenAI stack, from orchestration to evaluation, inference, retrieval and more.


🔎 ML Research

Sora

OpenAI published some technical details about the architecture of Sora, its ground breaking text-to-video model. Sora is based on diffusion models that focus on predicting the next “patch” or visual representaions in dataset —> Read more.

Large World Model

Researchers from UC Berkeley including robotics legend Pieter Abbeel published a paper detailing large world model(LWM), a family of large-context, multimodal models. LVM uses a techinque called RingAttention to scale the context windows to about one million token —> Read more.

User-LLM

Google Research published a paper detailing User-LLM, a framework that leverages embeddings to contextualize LLMs. The core idea is that these embeddings cpature user preferences over time and can personalize the interaction with LLMs —> Read more.

LongRoPE

Microsoft Research published a paper introducing LongRoPE, a method for extending the context window of LLMs beyond 2 million tokens. The method combines several innovations such as multidimensional interporlation and evolutionary search to drastically scale the context window in LLMs —> Read more.

Pre-Instruction Tuning

Researchers from Carnegie Mellon University, Meta AI and University of Washington introducing pre-instruction-tuning a method for improving continuing learning in LLMs. PIT instruction-tunes LLMs in QA pairs before a retraining run which improves the ability of the LLM to generatize new knowledge —> Read more.

VideoPrism

Google Research published a paper detailing VideoPrism, a foundation model for video understanding. VideoPrism is optimized for a wide number of tasks including classication, captioning, retrieval and several others —> Read more.

🤖 Cool AI Tech Releases

Gemma

Google released Gemma, a family of small models built with the same technology behind Gemini —> Read more.

Stable Diffusion 3

Stability AI unveiled an early preview of StableDiffusion 3 —> Read more.

LoRA-Land

Predibase released LoRA-Land, a series of 25 fine-tined Mistral models that outperformed GPT-4 on specific tasks —> Read more.

🛠 Real World ML

Compound AI Systems

Berkelery AI Research(BAIR) published a detailed blog post discussing the idea of Compound AI Systems as the future of AI architectures —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.