Apple GPT is Coming!

A new research breakthrough outlines the path to run LLMs in IPhones and IPads.

Created Using DALL-E

Next Week in The Sequence:

  • Edge 355: Our new series about LLM reasoning techniques presents a taxonomy for reasoning methods. We review Microsoft’s Math Prompter which exhibits reasoning abilities in complex math tasks and the Chain of Thought Hub framework.

  • Edge 356: We dive into Microsoft’s amazing Orca 2 which exhibits amazing reasoning capabilities.

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Apple GPT is Coming

When we think about tech incumbents that could be severely disrupted by generative AI, Apple often tops the list. While Microsoft, Amazon, NVIDIA, Google, and even Meta have unveiled clear playbooks for their generative AI strategies, the Cupertino giant seems to have dangerously fallen behind in this space.

That might soon change…

In a somewhat surprising paper titled ‘LLM in a Flash: Efficient Large Language Model Inference with Limited Memory,’ Apple unveiled a new technique to run LLMs on devices with limited DRAM capacity. The cornerstone of this technique is the use of flash storage in mobile devices to store model parameters, loading them on-demand into DRAM. Apple’s method is hyper-optimized to minimize the volume of data transmitted from flash storage, while also transmitting the data in small, continuous chunks. The result allows for running models twice as large as the available DRAM, while also showing a 4.5x increase in inference speed on CPUs and 20-25x on GPUs, respectively. Quite impressive!

‘LLM in a Flash’ outlines a clear path for running sophisticated LLM models on iPhones and iPads, which seems like the natural vehicle for Apple to enter the generative AI space. Maybe we are about to see Apple GPT in the next iOS release after all.

🔎 ML Research

LLM in a Flash

Apple Research published a paper outlining a technique for LLM inference with limited memory. The method involves storing the parameters in a flash memory and bringing them on demand to DRAM —> Read more.

VideoPoet

Google Research published a paper detailing VideoPoet, a zero-shot video generation LLM. The model supports a number of video generation tasks such as text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio —> Read more.

InsightPilot

Microsoft Research published a paper discussing InsightPilot, an LLM-based system for data exploration. The framework takes a dataset as input and triggers a series of LLM-based analytical actions —> Read more.

Multi-Step Reasoning Agent

Google DeepMind published a paper outlining a ReAct-style LLM agent capable of multi-step reasoning. The agent uses reinforcement learning with AI feedback for regularly improvement and self-distillation —> Read more.

🤖 Cool AI Tech Releases

Midjourney v6

A new version of Midjourney is available with a lot of exciting capabilities —> Read more.

Stable Video Diffusion

Stability AI made Stable Video Diffusion available via its developer platform API —> Read more.

Titan Models

Amazon announced the availability of two Titan models in its Bedrock platform —> Read more.

🛠 Real World ML

AutoML at LinkedIn

LinkedIn shares some details about their AutoML architecture used for content abuse detection —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.