Lin Qiao, CEO & Co-Founder of Fireworks AI – Interview Series

Lin Qiao, was formerly head of Meta’s PyTorch and is the Co-Founder and CEO of Fireworks AI. Fireworks AI is a production AI platform that is built for developers, Fireworks partners with the world’s leading generative AI researchers to serve the best models, at the fastest speeds. Fireworks AI recently raised a $25M Series A.

What initially attracted you to computer science?

My dad was a very senior mechanical engineer at a shipyard, where he built cargo ships from scratch. From a young age, I learned to read the precise angles and measurements of ship blueprints, and I loved it.

I was very much into STEM from middle school onward– everything math, physics and chemistry I devoured. One of my high school assignments was to learn BASIC programming, and I coded a game about a snake eating its tail. After that, I knew computer science was in my future.

While at Meta you led 300+ world-class engineers in AI frameworks & platforms where you built and deployed Caffe2, and later PyTorch. What were some of your key takeaways from this experience?

Big Tech companies like Meta are always five or more years ahead of the curve. When I joined Meta in 2015, we were at the beginning of our AI journey– making the shift from CPUs to GPUs. We had to design AI infrastructure from the ground up. Models like Caffe2 were groundbreaking when they were created, but AI evolved so fast that they quickly grew outdated. We developed PyTorch and the entire system around it as a solution.

PyTorch is where I learned about the biggest roadblocks developers face in the race to build AI. The first challenge is finding stable and reliable model architecture that is low latency and flexible so that models can scale. The second challenge is total cost of ownership, so companies don’t go bankrupt trying to grow their models.

My time at Meta showed me how important it is to keep models and frameworks like PyTorch open-source. It encourages innovation. We would not have grown as much as we had at PyTorch without open-source opportunities for iteration. Plus, it’s impossible to stay up to date on all the latest research without collaboration.

Can you discuss what led you to launching Fireworks AI?

I’ve been in the tech industry for more than 20 years, and I’ve seen wave after wave of industry-level shifts– from the cloud to mobile apps. But this AI shift is a complete tectonic realignment.  I saw lots of companies struggling with this change. Everyone wanted to move fast and put AI first, but they lacked the infrastructure, resources and talent to make it happen. The more I talked to these companies, the more I realized I could solve this gap in the market.

I launched Fireworks AI both to solve this problem and serve as an extension of the incredible work we achieved at PyTorch. It even inspired our name! PyTorch is the torch holding the fire– but we want that fire to spread everywhere. Hence: Fireworks.

I have always been passionate about democratizing technology, and making it affordable and simple for developers to innovate regardless of their resources. That’s why we have such a user-friendly interface and strong support systems to empower builders to bring their visions to life.

Could you discuss what is developer centric AI and why this is so important?

It’s simple: “developer-centric” means prioritizing the needs of AI developers. For example: creating tools, communities and processes that make developers more efficient and autonomous.

Developer-centric AI platforms like Fireworks should integrate into existing workflows and tech stacks. They should make it simple for developers to experiment, make mistakes and improve their work. They should encourage feedback, because its developers themselves who understand what they need to be successful. Lastly, it’s about more than just being a platform. It’s about being a community – one where collaborating developers can push the boundaries of what’s possible with AI.

The GenAI Platform you’ve developed is a significant advancement for developers working with large language models (LLMs). Can you elaborate on the unique features and benefits of your platform, especially in comparison to existing solutions?

Our entire approach as an AI production platform is unique, but some of our best features are:

Efficient inference – We engineered Fireworks AI for efficiency and speed. Developers using our platform can run their LLM applications at the lowest possible latency and cost. We achieve this with the latest model and service optimization techniques including prompt caching, adaptable sharding, quantization, continuous batching, FireAttention, and more.

Affordable support for LoRA-tuned models – We offer affordable service of low-rank adaptation (LoRA) fine-tuned models via multi-tenancy on base models. This means developers can experiment with many different use cases or variations on the same model without breaking the bank.

Simple interfaces and APIs – Our interfaces and APIs are straightforward and easy for developers to integrate into their applications. Our APIs are also OpenAI compatible for ease of migration.

Off-the-shelf models and fine-tuned models – We provide more than 100 pre-trained models that developers can use out-of-the-box. We cover the best LLMs, image generation models, embedding models, etc. But developers can also choose to host and serve their own custom models. We also offer self-serve fine-tuning services to help developers tailor these custom models with their proprietary data.

Community collaboration: We believe in the open-source ethos of community collaboration. Our platform encourages (but doesn’t require) developers to share their fine-tuned models and contribute to a growing bank of AI assets and knowledge. Everyone benefits from growing our collective expertise.

Could you discuss the hybrid approach that is offered between model parallelism and data parallelism?

Parallelizing machine learning models improves the efficiency and speed of model training and helps developers handle larger models that a single GPU can’t process.

Model parallelism involves dividing a model into multiple parts and training each part on separate processors. On the other hand, data parallelism divides datasets into subsets and trains a model on each subset at the same time across separate processors. A hybrid approach combines these two methods. Models are divided into separate parts, which are each trained on different subsets of data, improving efficiency, scalability and flexibility.

Fireworks AI is used by over 20,000 developers and is currently serving over 60 billion tokens daily. What challenges have you faced in scaling your operations to this level, and how have you overcome them?

I’ll be honest, there have been many high mountains to cross since we founded Fireworks AI in 2022.

Our customers first came to us looking for very low latency support because they are building applications for either consumers, prosumers or other developers— all audiences that need speedy solutions. Then, when our customers’ applications started to scale fast, they realized they couldn’t afford the typical costs associated with that scale. They then asked us to help with lowering total cost of ownership (TCO), which we did. Then, our customers wanted to migrate from OpenAI to OSS models, and they asked us to provide on-par or even better quality than OpenAI. We made that happen too.

Each step in our product’s evolution was a challenging problem to tackle, but it meant our customers’ needs truly shaped Fireworks into what it is today: a lightning fast inference engine with low TCO. Plus, we provide both an assortment of high-quality, out-of-the-box models to choose from, or fine-tuning services for developers’ to create their own.

With the rapid advancements in AI and machine learning, ethical considerations are more important than ever. How does Fireworks AI address concerns related to bias, privacy, and ethical use of AI?

I have two teenage daughters who use genAI apps like ChatGPT often. As a mom, I worry about them finding misleading or inappropriate content, because the industry is just beginning to tackle the critical problem of content safety. Meta is doing a lot with the Purple Llama project, and Stability AI’s new SD3 modes are great. Both companies are working hard to bring safety to their new Llama3 and SD3 models with multiple layers of filters. The input-output safeguard model, Llama Guard, does get a good amount of usage on our platform, but its adoption is not on par with other LLMs yet.  The industry as a whole still has a long way to go to bring content safety and AI ethics to the forefront.

We at Fireworks care deeply about privacy and security. We are HIPAA and SOC2 compliant, and offer secure VPC and VPN connectivity. Companies trust Fireworks with their proprietary data and models to build their business moat.

What is your vision for how AI will evolve?

Just as AlphaGo demonstrated autonomy while learning to play chess by itself, I think we’ll see genAI applications get more and more autonomous. Apps will automatically route and direct requests to the right agent or API to process, and course-correct until they retrieve the right output. And instead of one function-calling model polling from others as a controller, we’ll see more self-organized, self-coordinated agents working in unison to solve problems.

Fireworks’ lightning-fast inference, function-calling models and fine-tuning service have paved the way for this reality. Now it’s up to innovative developers to make it happen.

Thank you for the great interview, readers who wish to learn more should visit Fireworks AI.

Lin Qiao, CEO & Co-Founder of Fireworks AI – Interview Series