Edge 452: The AI Magic Behind Google’s NotebookLM Audio Features

How does NotebookLM generate such cool podcasts?

Google’s NotebookLM has rapidly become one of the most popular AI tools since the release of ChatGPT. Podcast generation is by far the most popular feature of NotebookLM. These days I constantly find social media threads that use audio clips generated by NotebookLM to the point that I am starting to become familiar with the voices in the podcast. The audio generation in NotebookLM touches on aspects such as humor, regular questions, interruptions etc which are incredibly hard to master. How did Google achieved this? Well, NotebookLM’s audio generation capabilities were the result of combining several techniques developed by Google DeepMind over the last few years. Specifically NotebookLM audio magic was powered by innovations in two key models: SoundStorm and AudioLM, which underpin Google DeepMind’s approach to audio generation.

Audio generation represents a burgeoning area of research within the domain of Artificial Intelligence (AI). This field centers on the creation of artificial systems capable of generating realistic and coherent sounds, including speech and music. Google DeepMind has made notable strides in this domain, pioneering novel techniques that are significantly impacting audio generation.