Reading Beyond the Hype: Some Observations About OpenAI and Google’s Announcements

Google vs. OpenAI is shaping up as one of the biggest rivarly of the generative AI era.

Created Using Ideogram

Next Week in The Sequence:

  • Edge 397: Provides an overview of multi-plan selection autonomous agents. Discusses Allen AI’s ADaPT paper for planning in LLMs and introduces the SuperAGI framework for building autonomous agents.

  • Edge 398: We dive into the Microsoft’s amazing Phi-3 model which is able to outperform much larger models in math and computer science and started the small language model movement.

You can subscribed to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Reading Beyond the Hype: Some Observations About OpenAI and Google’s Announcements

What a week for generative AI! By now, every AI newsletter you subscribe to and every blog and tech publication you follow must have bombarded you with announcements from the OpenAI events and the Google I/O conference. Instead of doing more of the same, I thought we could share some thoughts about how this week’s events shape the competitive landscape between Google and OpenAI and the generative AI market in general. After all, Google has been the only tech incumbent that has chosen to compete directly with OpenAI and Anthropic instead of integrating those models into their distribution machine. This week’s announcements show that the race between Google and OpenAI might be shaping up to be one of the iconic tech rivalries of the next generation.

Here are some observations about OpenAI and Google’s recent announcements:

  1. Feature Parity: One astonishing development this week was seeing how Google and OpenAI are both working on similar roadmaps. From image and video generation to generalist agents, it seems that both companies are matching each other’s capabilities. It is impressive to see how fast Google has caught up.

  2. From Models to Agents: The announcements of GPT-4o and Project Astra revealed that both OpenAI and Google are expanding into the generalist agent space. In the case of Google, this is not that surprising.

  3. Engineering Matters: Speed was at the center of the GPT-4o and Gemini Flash announcements. The evolution of generative AI today is as much about engineering as it is about research.

  4. Context Window Edge: The Gemini 2M token context window is astonishing. Google recently hinted at their work in this area with the Inifini Attention research. This massive context could make quite a competitive difference in several scenarios.

  5. Video is the Next Differentiator: It is very clear that video is one of the next frontiers for generative AI. A few weeks ago, OpenAI seemed to be ahead of all other tech incumbents with Sora, but Google’s Veo looks incredibly impressive. Interestingly, no other major AI platform seems to be developing video generation capabilities at scale, which makes startups like RunWay and Pika very attractive acquisition targets. Otherwise, this will become a two-horse race.

  6. Google AI Strategy is Broad and Fragmented: Google’s announcements this week were overwhelming. There is the Gemini Flash massive model and the Gemma small model, generative video models, Astra multimodal agents, Gemini-powered search, AI in Workspace apps like Gmail or Docs, AI for Firebase developers, and maybe new AI-powered glasses? You get the idea. Executing well across all these areas while pushing the boundaries of research is extremely difficult. And yet, it might be working.

  7. Google’s Bold AI Strategy Might Pay Off: Unlike Microsoft, Apple, Amazon, and Oracle, Google didn’t rush to partner with OpenAI or Anthropic and instead decided to rely on its DeepMind and Google Research AI talent. They have had plenty of public missteps and challenges, but they have proven that they can take a punch and remain quite competitive. The result is that now Google has achieved near feature parity with OpenAI and is in control of its own destiny.

I could keep going, but this is supposed to be a short editorial. This week was another example that generative AI is moving at a speed we have never seen in any tech trend. The rivalry between Google and OpenAI promises to push the boundaries of the space.

🔎 ML Research

VASIM

Microsoft Research published a paper introducing Vertical Autoscaling Simulator Toolkit(VASIM), a tool designed to address teh challenges of autoscaling algorithms such as the ones used in cloud computing infrastructures. VASIM can simulate different autoscaling policies and optimization parameters —> Read more.

RLHF Workflow

Salesforce Research published a paper proposing an Online Iterative Reinforcement Learning from Human Feedback (RLHF) technique that can improve over offline alternatives. The method also looks to address the resource limitations of online RLHF for open source projects —> Read more.

SambaNova SN40L

SambaNova researchers published a paper proposing SambaNova SN40L, a technique that combines composition of experts(CoEs) , streaming dataflow and a three-tier memory to scale the AI memory wall. The research looks to address some of the limitations of hyperscalers when deploying large monolithic LLMs —>Read more.

Med-Gemini

Google published two research papers exploring the capabilities of Gemini in healthcare scenarios. The first paper explores Gemini in electronic medical records processing while the second one focuses on use cases such as radiology, pathology, dermatology, ophthalmology, and genomics in healthcare —> Read more.

BEHAVIOR Vision Suite

Researchers from Stanford, Harvard, Meta, University of Southern California and other AI labs published a paper introducing BEHAVIOR Vision Suite, a set of tools for generaring fully customized synthetic data for computer vision models. BVS could have a profound impact in scenarios such as embodied AI or self-driving cars —> Read more.

Online vs. Offline RLHF

Researchers from Google DeepMind published a paper outlining the series of experiments comparing online and offline RLHF alignment methods. The experiments showed clear superiority of online methods and dives into potential explanations —> Read more.

🤖 Cool AI Tech Releases

GPT-4o

OpenAI unveiled GPT-4o, a new model that can work with text, audio and vision in real time —> Read more.

Gemini 1.5

Google introduced Gemini 1.5 Flash and new improvements to its Pro model —> Read more.

Transformer Agents 2.0

Hugging Face open sourced a new version of its Transformers Agents framework —> Read more.

Model Explorer

Google Research released Model Explorer, a tool for visualizing ML models as graphs —> Read more.

Gemma 2

Google unveiled Gemma 2, a 27B parameter LLM that can run in a single TPU —> Read more.

Imagen 3

Google released Imagen 3, the latest version of its flagship text-to-image model —> Read more.

Firebase Genkit

Google released Firebase Genkit, a framework for building mobile and web AI apps —> Read more.

🛠 Real World AI

Inside Einstein

Salesforce VP of Engineering shares details about the implementation of the Einstein platform —> Read more.

📡AI Radarwii

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.