The Sequence Chat: Why Transformers are the Best Thing that Ever Happened to NVIDIA

A discussion about some controvertial and original ideas in AI.

I wanted to devote some installments of The Sequence to outline some reflections about several controversial ideas around AI. At the end, one of the rarest things to find in today’s market plagued with hundreds of AI newsletters are publications that discuss original ideas. I think this section would be a cool complement to our interview series and, if nothing else, might force you to think about these topics even if you disagree with my opinion 😉

Today, I would like to start with a simple but controversial thesis that I was discussing with some of my students recently. The cornerstone of this thesis is why the transformer architecture used in foundation models is, arguably, the best thing that ever happened to NVIDIA.

Have you ever heard the phrase that the only company turning real profits in AI is NVIDIA? Well, transformers have a lot to do with that. The main reasons are both technical and market related:

Technical: The transformer architecture is the first model in which knowledge scales with pre, post training data without clear limits.
Market: The fact that transformers have become the dominant AI paradigm have given NVIDIA time to optimize its hardware for that architecture.
Scale: Past certain scale, all transformer architectures are running using NVIDIA GPUs.

Let’s dive into these two points: