A new research breakthrough outlines the path to run LLMs in IPhones and IPads.
Next Week in The Sequence:
-
Edge 355: Our new series about LLM reasoning techniques presents a taxonomy for reasoning methods. We review Microsoft’s Math Prompter which exhibits reasoning abilities in complex math tasks and the Chain of Thought Hub framework.
-
Edge 356: We dive into Microsoft’s amazing Orca 2 which exhibits amazing reasoning capabilities.
You can subscribe below!
📝 Editorial: Apple GPT is Coming
When we think about tech incumbents that could be severely disrupted by generative AI, Apple often tops the list. While Microsoft, Amazon, NVIDIA, Google, and even Meta have unveiled clear playbooks for their generative AI strategies, the Cupertino giant seems to have dangerously fallen behind in this space.
That might soon change…
In a somewhat surprising paper titled ‘LLM in a Flash: Efficient Large Language Model Inference with Limited Memory,’ Apple unveiled a new technique to run LLMs on devices with limited DRAM capacity. The cornerstone of this technique is the use of flash storage in mobile devices to store model parameters, loading them on-demand into DRAM. Apple’s method is hyper-optimized to minimize the volume of data transmitted from flash storage, while also transmitting the data in small, continuous chunks. The result allows for running models twice as large as the available DRAM, while also showing a 4.5x increase in inference speed on CPUs and 20-25x on GPUs, respectively. Quite impressive!
‘LLM in a Flash’ outlines a clear path for running sophisticated LLM models on iPhones and iPads, which seems like the natural vehicle for Apple to enter the generative AI space. Maybe we are about to see Apple GPT in the next iOS release after all.
🔎 ML Research
LLM in a Flash
Apple Research published a paper outlining a technique for LLM inference with limited memory. The method involves storing the parameters in a flash memory and bringing them on demand to DRAM —> Read more.
VideoPoet
Google Research published a paper detailing VideoPoet, a zero-shot video generation LLM. The model supports a number of video generation tasks such as text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio —> Read more.
InsightPilot
Microsoft Research published a paper discussing InsightPilot, an LLM-based system for data exploration. The framework takes a dataset as input and triggers a series of LLM-based analytical actions —> Read more.
Multi-Step Reasoning Agent
Google DeepMind published a paper outlining a ReAct-style LLM agent capable of multi-step reasoning. The agent uses reinforcement learning with AI feedback for regularly improvement and self-distillation —> Read more.
🤖 Cool AI Tech Releases
Midjourney v6
A new version of Midjourney is available with a lot of exciting capabilities —> Read more.
Stable Video Diffusion
Stability AI made Stable Video Diffusion available via its developer platform API —> Read more.
Titan Models
Amazon announced the availability of two Titan models in its Bedrock platform —> Read more.
🛠 Real World ML
AutoML at LinkedIn
LinkedIn shares some details about their AutoML architecture used for content abuse detection —> Read more.
📡AI Radar
-
OpenAI is in early discussions to raise capital at $100 valuation.
-
Anthropic is rumored to be raising anotehr $750 million.
-
AI compute platform Lightmatter raised $154 million in new funding.
-
AI media monitoring platform Meltwater raised $65 million in new funding.
-
Microsoft Copilot enabled music generation capabilities via its integration with Suno.
-
AI sales and marketing startup Ignition announced a seed round.
-
AI VR startup Virtualleap announced a $2.5 million round.
-
The Simulation by Fable open sourced a new platform for creating Westworld like simulations.