Jay Dawani is Co-founder & CEO of Lemurian Labs. Lemurian Labs is on a mission to deliver affordable, accessible, and efficient AI computers, driven by the belief that AI should not be a luxury but a tool accessible to everyone. The founding team at Lemurian Labs combines expertise in AI, compilers, numerical algorithms, and computer architecture, united by a single purpose: to reimagine accelerated computing.
Can you walk us through your background and what got you into AI to begin with?
Absolutely. I’d been programming since I was 12 and building my own games and such, but I actually got into AI when I was 15 because of a friend of my fathers who was into computers. He fed my curiosity and gave me books to read such as Von Neumann’s ‘The Computer and The Brain’, Minsky’s ‘Perceptrons’, Russel and Norvig’s ‘AI A Modern Approach’. These books influenced my thinking a lot and it felt almost obvious then that AI was going to be transformative and I just had to be a part of this field.
When it came time for university I really wanted to study AI but I didn’t find any universities offering that, so I decided to major in applied mathematics instead and a little while after I got to university I heard about AlexNet’s results on ImageNet, which was really exciting. At that time I had this now or never moment happen in my head and went full bore into reading every paper and book I could get my hands on related to neural networks and sought out all the leaders in the field to learn from them, because how often do you get to be there at the birth of a new industry and learn from its pioneers.
Very quickly I realized I don’t enjoy research, but I do enjoy solving problems and building AI enabled products. That led me to working on autonomous cars and robots, AI for material discovery, generative models for multi-physics simulations, AI based simulators for training professional racecar drivers and helping with car setups, space robots, algorithmic trading, and much more.
Now, having done all that, I’m trying to reign in the cost of AI training and deployments because that will be the greatest hurdle we face on our path to enabling a world where every person and company can have access to and benefit from AI in the most economical way possible.
Many companies working in accelerated computing have founders that have built careers in semiconductors and infrastructure. How do you think your past experience in AI and mathematics impacts your ability to understand the market and compete effectively?
I actually think not coming from the industry gives me the benefit of having the outsider advantage. I have found it to be the case quite often that not having knowledge of industry norms or conventional wisdoms gives one the freedom to explore more freely and go deeper than most others would because you’re unencumbered by biases.
I have the freedom to ask ‘dumber’ questions and test assumptions in a way that most others wouldn’t because a lot of things are accepted truths. In the past two years I’ve had several conversations with folks within the industry where they are very dogmatic about something but they can’t tell me the provenance of the idea, which I find very puzzling. I like to understand why certain choices were made, and what assumptions or conditions were there at that time and if they still hold.
Coming from an AI background I tend to take a software view by looking at where the workloads today, and here are all the possible ways they may change over time, and modeling the entire ML pipeline for training and inference to understand the bottlenecks, which tells me where the opportunities to deliver value are. And because I come from a mathematical background I like to model things to get as close to truth as I can, and have that guide me. For example, we have built models to calculate system performance for total cost of ownership and we can measure the benefit we can bring to customers with software and/or hardware and to better understand our constraints and the different knobs available to us, and dozens of other models for various things. We are very data driven, and we use the insights from these models to guide our efforts and tradeoffs.
It seems like progress in AI has primarily come from scaling, which requires exponentially more compute and energy. It seems like we’re in an arms race with every company trying to build the biggest model, and there appears to be no end in sight. Do you think there is a way out of this?
There are always ways. Scaling has proven extremely useful, and I don’t think we’ve seen the end yet. We will very soon see models being trained with a cost of at least a billion dollars. If you want to be a leader in generative AI and create bleeding edge foundation models you’ll need to be spending at least a few billion a year on compute. Now, there are natural limits to scaling, such as being able to construct a large enough dataset for a model of that size, getting access to people with the right know-how, and getting access to enough compute.
Continued scaling of model size is inevitable, but we also can’t turn the entire earth’s surface into a planet sized supercomputer to train and serve LLMs for obvious reasons. To get this into control we have several knobs we can play with: better datasets, new model architectures, new training methods, better compilers, algorithmic improvements and exploitations, better computer architectures, and so on. If we do all that, there’s roughly three orders of magnitude of improvement to be found. That’s the best way out.
You are a believer in first principles thinking, how does this mold your mindset for how you are running Lemurian Labs?
We definitely employ a lot of first principles thinking at Lemurian. I have always found conventional wisdom misleading because that knowledge was formed at a certain point in time when certain assumptions held, but things always change and you need to retest assumptions often, especially when living in such a fast paced world.
I often find myself asking questions like “this seems like a really good idea, but why might this not work”, or “what needs to be true in order for this to work”, or “what do we know that are absolute truths and what are the assumptions we’re making and why?”, or “why do we believe this particular approach is the best way to solve this problem”. The goal is to invalidate and kill off ideas as quickly and cheaply as possible. We want to try and maximize the number of things we’re trying out at any given point in time. It’s about being obsessed with the problem that needs to be solved, and not being overly opinionated about what technology is best. Too many folks tend to overly focus on the technology and they end up misunderstanding customers’ problems and miss the transitions happening in the industry which could invalidate their approach resulting in their inability to adapt to the new state of the world.
But first principles thinking isn’t all that useful by itself. We tend to pair it with backcasting, which basically means imagining an ideal or desired future outcome and working backwards to identify the different steps or actions needed to realize it. This ensures we converge on a meaningful solution that is not only innovative but also grounded in reality. It doesn’t make sense to spend time coming up with the perfect solution only to realize it’s not feasible to build because of a variety of real world constraints such as resources, time, regulation, or building a seemingly perfect solution but later on finding out you’ve made it too hard for customers to adopt.
Every now and then we find ourselves in a situation where we need to make a decision but have no data, and in this scenario we employ minimum testable hypotheses which give us a signal as to whether or not something makes sense to pursue with the least amount of energy expenditure.
All this combined is to give us agility, rapid iteration cycles to de-risk items quickly, and has helped us adjust strategies with high confidence, and make a lot of progress on very hard problems in a very short amount of time.
Initially, you were focused on edge AI, what caused you to refocus and pivot to cloud computing?
We started with edge AI because at that time I was very focused on trying to solve a very particular problem that I had faced in trying to usher in a world of general purpose autonomous robotics. Autonomous robotics holds the promise of being the biggest platform shift in our collective history, and it seemed like we had everything needed to build a foundation model for robotics but we were missing the ideal inference chip with the right balance of throughput, latency, energy efficiency, and programmability to run said foundation model on.
I wasn’t thinking about the datacenter at this time because there were more than enough companies focusing there and I expected they would figure it out. We designed a really powerful architecture for this application space and were getting ready to tape it out, and then it became abundantly clear that the world had changed and the problem truly was in the datacenter. The rate at which LLMs were scaling and consuming compute far outstrips the pace of progress in computing, and when you factor in adoption it starts to paint a worrying picture.
It felt like this is where we should be focusing our efforts, to bring down the energy cost of AI in datacenters as much as possible without imposing restrictions on where and how AI should evolve. And so, we got to work on solving this problem.
Can you share the genesis story of Co-Founding Lemurian Labs?
The story starts in early 2018. I was working on training a foundation model for general purpose autonomy along with a model for generative multiphysics simulation to train the agent in and fine-tune it for different applications, and some other things to help scale into multi-agent environments. But very quickly I exhausted the amount of compute I had, and I estimated needing more than 20,000 V100 GPUs. I tried to raise enough to get access to the compute but the market wasn’t ready for that kind of scale just yet. It did however get me thinking about the deployment side of things and I sat down to calculate how much performance I would need for serving this model in the target environments and I realized there was no chip in existence that could get me there.
A couple of years later, in 2020, I met up with Vassil – my eventual cofounder – to catch up and I shared the challenges I went through in building a foundation model for autonomy, and he suggested building an inference chip that could run the foundation model, and he shared that he had been thinking a lot about number formats and better representations would help in not only making neural networks retain accuracy at lower bit-widths but also in creating more powerful architectures.
It was an intriguing idea but was way out of my wheelhouse. But it wouldn’t leave me, which drove me to spending months and months learning the intricacies of computer architecture, instruction sets, runtimes, compilers, and programming models. Eventually, building a semiconductor company started to make sense and I had formed a thesis around what the problem was and how to go about it. And, then towards the end of the year we started Lemurian.
You’ve spoken previously about the need to tackle software first when building hardware, could you elaborate on your views of why the hardware problem is first and foremost a software problem?
What a lot of people don’t realize is that the software side of semiconductors is much harder than the hardware itself. Building a useful computer architecture for customers to use and get benefit from is a full stack problem, and if you don’t have that understanding and preparedness going in, you’ll end up with a beautiful looking architecture that is very performant and efficient, but totally unusable by developers, which is what is actually important.
There are other benefits to taking a software first approach as well, of course, such as faster time to market. This is crucial in today’s fast moving world where being too bullish on an architecture or feature could mean you miss the market entirely.
Not taking a software first view generally results in not having derisked the important things required for product adoption in the market, not being able to respond to changes in the market for example when workloads evolve in an unexpected way, and having underutilized hardware. All not great things. That’s a big reason why we care a lot about being software centric and why our view is that you can’t be a semiconductor company without really being a software company.
Can you discuss your immediate software stack goals?
When we were designing our architecture and thinking about the forward looking roadmap and where the opportunities were to bring more performance and energy efficiency, it started becoming very clear that we were going to see a lot more heterogeneity which was going to create a lot of issues on software. And we don’t just need to be able to productively program heterogeneous architectures, we have to deal with them at datacenter scale, which is a challenge the likes of which we haven’t encountered before.
This got us concerned because the last time we had to go through a major transition was when the industry moved from single-core to multi-core architectures, and at that time it took 10 years to get software working and people using it. We can’t afford to wait 10 years to figure out software for heterogeneity at scale, it has to be sorted out now. And so, we got to work on understanding the problem and what needs to exist in order for this software stack to exist.
We are currently engaging with a lot of the leading semiconductor companies and hyperscalers/cloud service providers and will be releasing our software stack in the next 12 months. It is a unified programming model with a compiler and runtime capable of targeting any kind of architecture, and orchestrating work across clusters composed of different kinds of hardware, and is capable of scaling from a single node to a thousand node cluster for the highest possible performance.
Thank you for the great interview, readers who wish to learn more should visit Lemurian Labs.