Meta AI’s Big Announcements

New AR glasses, Llama 3.2 and more.

Created Using Ideogram

Next Week in The Sequence:

  • Edge 435: Our series about SSMs continues discussing Hungry Hungry Hippos (H3) which has become one of the most important layers in SSM models. We review the original H3 paper and discuss Character.ai’s PromptPoet framework.

  • Edge 436: We review Salesforce recent work in models specialized in agentic tasks.

You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Meta AI’s Big Announcements

Meta held its big conference, *Connect 2024*, last week, and AI was front and center. The two biggest headlines from the conference were the launch of the fully holographic Orion AI glasses, which represent one of the most important products in Meta’s ambitious and highly controversial AR strategy. In addition to the impressive first-generation Orion glasses, Meta announced that the company is developing a new brain-computer interface for the next version.

The other major release at the conference was Llama 3.2, which includes smaller language models of sizes 1B and 3B, as well as larger 11B and 90B vision models. This is Meta’s first major attempt to open source image models, signaling its strong commitment to open-source generative AI. Additionally, Meta AI announced the Llama Stack, which provides standard APIs in areas such as inference, memory, evaluation, post-training, and several other aspects required in Llama applications. With this release, Meta is transitioning Llama from isolated models to a complete stack for building generative AI apps.

There were plenty of other AI announcements at *Connect 2024*:

  • Meta introduced voice capabilities to its Meta AI chatbot, allowing users to have realistic conversations with the chatbot. This feature puts Meta AI on par with its competitors, like OpenAI and Google, which have already introduced voice modes to their products.

  • Meta announced an AI-powered, real-time language translation feature for its Ray-Ban smart glasses. This feature will allow users to translate text from Spanish, French, and Italian by the end of the year.

  • Meta is developing an AI feature for Instagram and Facebook Reels that will automatically dub and lip-sync videos into different languages. This feature is currently in testing in the US and Latin America.

  • Meta is adding AI image generation features to Facebook and Instagram. The new feature will be similar to existing AI image generators, such as Apple’s Image Playground, and will allow users to share AI-generated images with friends or create posts.

It was an impressive week for Meta AI, to say the least.

🔎 ML Research

AlphaProteo

Google DeepMind published a paper introducing AlphaProteo, a new family of model for protein design. The model is optimized for novel, high strength proteins that can improve our understanding of biological processes —> Read more.

Molmo and PixMo

Researchers from the Allen Institute for AI published a paper detailing Molmo and Pixmo, an open wegit and open data vision-language model(VLM). Molmo showcased how to train VLMs from scratch while Pixmo is the core set of datasets used during training —> Read more.

Instruction Following Without Instruction Tuning

Researchers from Stanford University published a paper detailing a technique called implicit instruction tuning that surfaces instruction following behaviors without explicity fine tuning the model. The paper also suggests some simple changes to a model distribution that can yield that implicity instruction tuning behavior —> Read more.

Robust Reward Model

Google DeepMind published a paper discussing some of the challenges of traditional reward models(RMs) to identify preferences in prompt indepdendent artifacts. The paper introduces the notion of robust reward model(RRM) that addresses this challenge and shows great improvements in models like Gemma —> Read more.

Real Time Notetaking

Researchers from Carnegie Mellon University published a paper outlining NoTeeline, a real time note generation method for video streams. NoTeeline generates micronotes that capture key points in a video while maintaining a consistent writing style —> Read more.

AI Watermarking

Researchers from Carnegie Mellon University published a paper evaluating different design choices in LLM watermarking. The paper also studies different attacks that result in the bypassing or removal of different watermarking techniques —> Read more.

🤖 AI Tech Releases

Llama 3.2

Meta open sourced Llama 3.2 small and medium size models —> Read more.

Llama Stack

As part of the Llama 3.2 release, Meta open sourced the Llama Stack, a series of standarized building blocks to develop Llama-powered applications —> Read more.

Gemini 1.5

Google released two updated Gemini models and new pricing and performance tiers —> Read more.

Cohere APIs

Cohere launched a new set of APIs that improve its experience for developers —> Read more.

🛠 Real World AI

Data Apps at Airbnb

Airbnb discusses Sandcastle, an internal framework that allow data scientists rapidly protype data driven apps —> Read more.

Feature Caching at Pinterest

The Pinterest engineering team discusses its internal architecture for feature caching in AI recommender systems —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.