Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week

Two papers that open new possibilities for generative AI.

Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week

Created Using DALL-E

Next Week in The Sequence:

  • Edge 375: Out series about reasoning in LLMs continues by exploring Meta’s recent work in System2 attention. We also review the Chainlit framework to build LLM applications.

  • Edge 376: We dive into the amazing SGLang framework created by UC Berkeley which provide significant performance gains in LLM inference.

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week

Every week, there is an avalanche of research papers pioneering new techniques in generative AI, but only a tiny percentage of those papers contain contributions that are truly going to push the boundaries of the space. Last week was exceptional in terms of published papers, with two that could have a remarkable impact on the next few years of generative AI.

  1. Text to Games with Genie

Google DeepMind continues to challenge our imagination when it comes to generative AI. Last week, the research lab unveiled Genie, a generative model that can create a playable 2D video game from a text description, a sketch, or a photo. What makes Genie remarkable is its ability to learn fine-grained controls while being trained solely on videos. This is remarkable because videos typically don’t include labels for actions being performed on them. Genie not only learns the actions from video sequences but also variations of these actions that are applicable to the same environment. Amazing!

Genie is in the super early stages, but its impact can be profound. From simulations and gaming to robotics, the ability to generate interactive environments can become one of the next frontiers for generative AI.

1-Bit LLMs

Computational and memory costs are some of the biggest roadblocks to the adoption of LLMs. Techniques such as quantization can improve inference time but often sacrifice accuracy. Recently, a team of researchers from Microsoft and the University of Chinese Academy of Sciences proposed an architecture called BitNet that uses an extreme form of quantization called a 1-bit model as a way to improve cost efficiency without sacrificing performance. Last week, the team doubled down and proposed a variant of the original BitNet called BitNet b1.58, which provides additional gains in cost-effectiveness, memory, latency, and throughput. BitNet b1.58 accomplishes this by using a structure that can represent the weights and parameters of the model using only 1.58 bits instead of the typical 16-bit representation of most LLMs.

The implications of BitNet b1.58 in generative AI can be quite significant. The new architecture can open the door to scaling the training and inference of LLMs using commodity hardware, and, if nothing else, the performance increases in current architectures should be notable.

Both Genie and the 1-Bit LLM represent major research milestones in areas that were deemed impossible a few months ago. The pace of research in generative AI is breathtaking. Amazing times.


Learn from top GenAI experts at GenAI Productionize 2024 – an industry-first summit on productionizing enterprise GenAI!

We’re only a week away from LinkedIn, Google, Coinbase, Roblox, Comcast, Fidelity, Procter&Gamble, Chegg, LlamaIndex and more teaching how to get GenAI apps into production, including practical strategies for governance, evaluation, and monitoring.

Free registration!


🔎 ML Research

Genie

Google DeepMind published a paper introducing generative interactive environments(Genie), a model that can generate interactive playable environments from a single image prompt. Genie was trained on a dataset of 2D games and robotic videos and the approach seems quite generalizable to otehr domains —> Read more.

1-Bit LLMs

Microsoft Research published a paper proposing BitNet b1.58, a 1-bit LLM variant that uses 1.58 bits per parameter which leads to massive saves in computational and memory requirements without sacrificing performance. Differently from traditional 16 bit models, BitNet uses a {-1, 0, 1} ternary encoding for every weight and parameter which matches full-precision of 16 bit model —> Read more.

EMO

Alibaba Research published a paper detailing EMO, a framework for generating expressive videos from input audio and images. EMO combines a ReferenceNet network to extract features with a diffusion model to generate the final video frames —> Read more.

Finetuning and Scaling

Google DeepMind published a paper analyzing the effectiveness of fine-tuning methods relative to the scale of LLMs. The analysis covers both the effect of data and model size in finetunning algorithms —> Read more.

Generating Better Images with Hierarchical Prompts

Microsoft Research published a paper detailing a technique to enhance images created by visual language models using hierarchical prompts. The method creates detailed graphs of image decriptions which are using to generate more detailed images —> Read more.

🤖 Cool AI Tech Releases

Mistral Large

Mistral announced its biggest model so far, Mistral Large, which matches the performance of GPT-4 across several benchmarks —> Read more.

Le Chat

Mistral also unveiled Le Chat, a ChatGPT competitors built on their foundation models —> Read more.

Samba-1

NVIDIA competitor SambaNova released Samba-1, a one trillion parameter model optimized for enterprise scenarios —> Read more.

StarCoder2

BigCode released StarCoder2 , an open source code generation LLM —> Read more.

🛠 Real World ML

AI-Assisted Development at Pinterest

Pinterest dicusses lessons learned and best practices about enabling AI-assisted development processes —> Read more.

AI Code Generation at GitHub

GitHub shares some insights and best practices about AI code generation —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.