Podcasts BusinessLatent Space: The AI Engineer Podcast

Listen to this podcast in the app for free:

radio.net

Sleep timer

Save favorites

Download for free in the App Store

Latent Space: The AI Engineer Podcast

Latent.Space

Business Entrepreneurship

Latest episode

291 episodes

Codex from 0 to 10M Users: Building ChatGPT Work — Akshay Nathan, OpenAI
07/28/2026 | 1h 9 mins.
There are roughly 100x more people who use code than who can write code. As code that “just works” becomes easier to generate, this group may be the biggest prize of all — if you can get the agentic interface right.
A key trend we have been tracking over at AINews is the absolute explosion in Codex usage this year, with MAU now up >10x from Jan 2026. Less than two weeks after their July 9th launch, OpenAI said ChatGPT Work and Codex had reached 10M users combined (as we cover in the pod, Codex now powers ChatGPT Work, so all ChatGPT Work users are now users of the Codex harness, even if they aren’t traditional engineers) — showing the early innings of what happens when you graduate from coding agents to knowledge work agents:
We’ve been calling out how coding agents are “breaking containment” to do everything else this year to power every other part of knowledge work - and it started with the org chart, with a major reorg last month that amounted to two of Codex’s most prominent leaders, Greg and Tibo, taking responsibility over product and ChatGPT specifically, completing a “Superapp” consolidation cycle first discussed in March.
With these updates Codex is no longer just a coding tool. In June, OpenAI said knowledge workers already accounting for roughly 20% of Codex’s user base and growing more than 3x as quickly as developers. A product dedicated for knowledge workers was being pulled out of the Codex team.
However, knowledge work has a different set of problems and environments than coding. For decades, knowledge work has been scattered across different primitives like documents for writing, spreadsheets for analysis, slide decks for communication, and specialized applications for everything else. ChatGPT Work now enables users to work across every primitive with agents. Instead of opening an application and manually operating its features, the user can describe an outcome and collaborates with an agent that can assemble the tools, context, and artifact needed to reach it.
From building no-code products at Airtable to leading Productivity Engineering at OpenAI, Akshay Nathan has spent much of his career trying to make the power of software accessible to people who do not write code. In this episode, Akshay joins swyx and Vibhu to unpack the launch of ChatGPT Work, why Codex unexpectedly took off among non-developers inside OpenAI, and the company’s broader plan to bring useful agents from software engineers to knowledge workers and eventually everyone.
We go deep on the shared agent harness behind Codex and ChatGPT Work, why OpenAI brought the experiences together without making them identical, and how persistent computers, artifacts, Sites, plugins, memory, and sub-agents are changing what people can delegate to AI. Akshay explains why some teams are replacing decks and spreadsheets with interactive websites, how agents can gather context across code, Slack, documents, and local files, and what OpenAI learned from personal-agent products like OpenClaw.
Side note: also don’t miss Abhihek’s sandbox track keynote at AIE, which now powers a lot of the sandboxing for ChatGPT Work… and yes was also broken by an unreleased OpenAI model in the recent HuggingFace incident.
Akshay also reflects on how AI is transforming product development itself: why more people will become generalists with a specialty, why ideas and taste become the bottlenecks when almost anyone can build, why LLMs still struggle to generate genuinely grounded new ideas, and why teams must distinguish increased motion from actual progress.
We discuss:
* Why Codex unexpectedly took off among non-developers inside OpenAI
* Why employees felt like using Codex gave them a new superpower
* The product insight that led OpenAI to build ChatGPT Work
* Why Codex and ChatGPT Work share the same underlying agent harness
* How their UX, Git visibility, artifacts, and sandboxing defaults differ
* Why OpenAI merged its agent experiences instead of building separate products
* How AI is blurring the boundaries between engineering, design, strategy, and operations
* Why OpenAI wants the default model configuration to work for most users
* When power users should use deeper reasoning, Ultra, or multi-agent modes
* Artifacts, agentic spreadsheets, and creating high-fidelity work products
* Why interactive Sites may replace decks and spreadsheets
* The challenge of designing a simple interface for an agent that can build almost anything
* Why users should retry tasks that models could not handle three or six months ago
* How AI can gather context for performance reviews without replacing human judgment
* The OpenAI automation that turns internal Slack and document activity into memes
* What reaching ten million ChatGPT Work and Codex users means for the product
* How OpenClaw inspired persistent environments, scheduled tasks, and personal agents
* Using ChatGPT for financial planning, budgeting, workouts, meals, and household management
* The design tradeoffs behind sub-agents and how much of their work users should see
* ChatGPT memory, Chronicle, and long-term context
* Why AI may make more people generalists with deep specialties
* Why ideas and taste become more important when almost anyone can build
* Why LLMs still struggle with the instruction “bring me new ideas”
* Measuring productivity through quality at-bats instead of commits, tokens, or pull requests
* The critical difference between AI-generated motion and meaningful progress
Akshay Nathan
* LinkedIn: https://www.linkedin.com/in/akshaynathan/
* X: https://x.com/akshaynathan_
Timestamps
00:00:00 Introduction and Bringing the Power of Code to Everyone
00:01:33 Joining OpenAI and Preserving a Startup Culture
00:02:40 What OpenAI Learned from Enterprise AI Adoption
00:05:28 Why OpenAI Built ChatGPT Work
00:07:17 Codex vs. ChatGPT Work and the Shared Agent Harness
00:12:07 Why OpenAI Merged Its Agent Experiences
00:16:24 Models, Reasoning Levels, and Choosing the Right Default
00:20:26 Artifacts, Agentic Spreadsheets, and Model–Product Collaboration
00:24:22 Why Sites Could Replace Decks and Spreadsheets
00:30:08 Designing an Agent That Can Build Almost Anything
00:34:28 From Developer Agents to Knowledge Work—and Everyone
00:36:07 Power-User Advice and AI-Assisted Performance Reviews
00:40:41 OpenAI’s Internal AI Memes and the Ten-Million-User Launch
00:44:39 OpenClaw, Personal Agents, and ChatGPT as an Operating System
00:50:24 Sub-Agents, Ultra Mode, and How Much Control Users Need
00:54:39 ChatGPT Memory, Personalization, and Chronicle
01:00:19 How AI Is Reshaping Product Development and Tech Roles
01:03:15 Ideas, Taste, and Why LLMs Struggle to Generate New Ideas
01:04:42 Measuring Productivity, Quality At-Bats, and Motion vs. Progress
Transcript
Introduction: Akshay Nathan, ChatGPT Work, and the No-Code Arc
Swyx [00:00:00]: We’re here in the studio with Akshay from OpenAI. Welcome.
Akshay Nathan [00:00:07]: Thank you.
Swyx [00:00:08]: And with our trusty co-host, Vibhu. So you recently launched ChatGPT Work. You lead Core Product Engineering. It’s been a long journey, into all this. I find it very interesting that you started with no code or low code, with Walrus and Airtable. And to some extent, ChatGPT Work is like the super app of super apps of, well, here is the ultimate no code. You just write a prompt.
Akshay Nathan [00:00:32]: Yeah. It’s funny how things come, full circle. I think for a long time in my career, I started my career working consumer fintech, but then after that, like, there’s this hypothesis that, the things that we were able to do with code, like, as engineers, like, if we could bring that to many more people in a more, accessible way, then that would be truly magical. We were working on a startup. It’s funny, like, before LLMs, before vision LLMs, on how to do automated testing with AI. It was just kinda jank, back then, but doing what we can, and then worked at Airtable for a while on the same thesis that, like, if we can bring a database or the primitives behind a database to people, that’d be really useful to them. But once LLMs came onto the scene, it became clear that, this was the missing piece, like, the missing technology required to, like, bring the magic of code to everyone without them having to know what’s going on underneath the hood. And so, like, I think this launch and a lot of the stuff that we’ve been up to is, like, the manifestation of that.
From Walrus and Airtable to OpenAI
Vibhu [00:01:33]: How was stuff when you joined? So you joined OpenAI 2023. Now we’ve got, so much more stuff, so ChatGPT, Codex app, ChatGPT Work. Have things changed?
Joining OpenAI and What Hasn’t Changed
Akshay Nathan [00:01:44]: I think the more interesting thing is how things haven’t changed. Like, one, I joined I remember when I joined, it was, like, five hundred people. One thing I was worried about was, like, I was looking for something, more early stage and, like, was it gonna feel startup enough? And I joined, and I was like, “This feels even more startup-y than I could ever imagine.” And, like, that really hasn’t changed even till now. I think the, like, level of, like, bottoms-up ambition and, like, the ability of anyone to, like, do anything or have an idea and ship it is really cool. But on the, like, mission side, I think what was really compelling to me is this mission of, bringing frontier intelligence to everyone. Like, building AGI and then bringing it to everyone. And, I think acknowledging back then that, like, that vision is gonna, not be a linear progression. Like, we’re probably gonna, like, try different products and have different things that succeed and don’t. But the vision has stayed the same, and the mission has stayed the same, and we’re starting to see the pieces, fall together, and that’s really cool.
Enterprise Lessons: No One-Size-Fits-All AI
Swyx [00:02:40]: You worked on Enterprise. What A lot of people never touch ChatGPT Enterprise. What is something that you learned from there that you’re bringing into your work now?
Akshay Nathan [00:02:52]: I think how there’s no one-size-fits-all solution in Enterprise. I remember in the early days of ChatGPT Enterprise, like, when we talked to customers and, like, everyone. That was, like, when I think it was a year after ChatGPT was released, and everyone was so excited to bring, AI into their enterprise. And, there were all these teams being stood up. It was, like, the AI deployment team with, like, these enormous budgets. And if you asked anyone, like, what were they excited about? Like, what were they excited about solving? Like, at first, you’d get, like, kinda like the baseline answers of, like, “Yeah, we have all this context and data and all this stuff.” But then if you ask them, like, “What was, like, a discrete use case that, like, they want AI to enable in their workplace?” You get such a different, like, variance, like, explosion of, different types of answers. And it’s interesting, like, you using, like, these models and these products, you have this box, and you can say anything to it, which is the magic. But it’on the flip side, it also means that, like, you don’t know what to do with it. And in Enterprise, I think a big part of that is, like, meeting the users where they are, like, what use case were they trying to solve, and then teaching them how they can use AI to, like, gain leverage there.
Swyx [00:03:56]: Do you meaningfully differentiate that from forward-deployed engineering?
Akshay Nathan [00:04:01]: I think there is the go-to-market side of it and then there is the product side of it. I think you need someone on the product side. And I think, like, however good we get at FDE motion, like, I think at the end of the day, if we have a user who’s, like, looking at their computer or looking at their phone, like, it’s our job in the product to, like, be enabling them and showing them where to go. So we’re really excited about that.
Vibhu [00:04:24]: Do you think there’s been changes, over the past three years of adoption? So there have been, step function changes. You have reasoning models and whatnot. Is there still the same problems of Enterprise has black box, don’t know what to do with it, or have things changed?
Adoption, Agents, and the Next 10x Market
Akshay Nathan [00:04:39]: We’re seeing now that, like, there’s this huge uptake, right? Everyone is extremely excited about it. It feels like, many people are, millions, hundreds of millions of people are using ChatGPT. They understand, like, how generally to work with AI. But then, like, every time, like, a new capability gets unlocked, so now, like, we’re seeing with agents, like, there is probably a contingent of, like, early adopters still who, truly get it, who are like, “ we you can do anything. You just have to make sure the right context is there, it’s connected to the right tools, and that you are supervising it, but, like, anything is possible.” But then there’s, like, this, like, 10x or 100x bigger market where, like, they don’t yet get that, or they don’t yet see that. And so I think that’s the next stage here. So to answer your question, like, I think the adoption is there and growing fast, but I think the opportunity is, like, far bigger than that. That’s where we wanna play, especially with ChatGPT Work.
ChatGPT Work, Codex, and the Super App Merge
Swyx [00:05:27]: Yeah. well, let’s, let’s skip ahead to ChatGPT Work. only, like, a month ago or so, announced. what was the decision process that led into it? there was this, overall merging of the super app. Is that what we’re officially calling it? you deprecated the browser as well. Just, summarize your last, like, couple months of working on this thing.
Akshay Nathan [00:05:50]: Yeah. It feels like forever now, but it’s only been a few months. I think maybe the one, impetus that, like- Is most salient is when we release Codex, or even internally had Codex, like, it was really surprising to us, I think we recently put out some stats on this, that there was this, like, real inflection of, like, adoption among non-developers at OpenAI. And, I, through this product development process, like, would go to, like, these UXR sessions to talk to people internally. And the thing that stuck out to me is, like, one, like, you go talk to, like, strategic finance or marketing or whatever, and they’re all using Codex for, their use cases. That part’s cool, but the thing that really stuck out to me is how proud people were that they were using Codex. Like, how, like
Swyx [00:06:34]: It’s like, “I’m not supposed to be using it, but I am.”
Akshay Nathan [00:06:36]: It was that. It was, like, that they were, early to this, like, new thing, but it was also this thing of, like, they felt like they had a superpower, right? And, what we recognized then is that, like, the power of Codex, the power of agents, like, we already had this massive distribution base of people who have, come to know and love ChatGPT. Like, how do we show that to them? Like, how do we bring it to them? Which is, like, a hard product problem, and it’s, like, a tricky thing, right? There’s many ways you can go about it. And so that’s what we called the Merge and the Super App over time, and ultimately launched it in ChatGPT Work, is how do we do that? But it came from that initial realization that, like, the power was not only for developers, like, much earlier than probably even we thought. Like, it could be extended to everyone.
Swyx [00:07:17]: How do you see the products differently? So, like, who is it for, right? So Codex started out even CLI, then app. Now there’s a merge of ChatGPT Codex and ChatGPT Work, so is it the opening for the average user, for enterprise, for work? How do you position it?
Akshay Nathan [00:07:36]: I think we want to get it to position it for if you’re doing work-related things, for lack of a better word, right?
Who ChatGPT Work Is For
Akshay Nathan [00:07:42]: I think productivity is what, like, the pillar that I support. Like, that’s the name of the team. And the reason for that, the reason we call it productivity and not, like, enterprise or, like, work or something like that, is because there’s also personal productivity, right? And, like, I think ChatGPT Work is I’ve seen people do things in their personal lives that you wouldn’t classify as, like, work technically, but, like, these agents are, super capable for. Like, one recent example that someone posted about, on our Slack is, like, someone had, like, a missed package, like they didn’t receive it, and then they got, like, the picture of it, from Amazon or whoever the courier was, and they, like, asked ChatGPT Work to, like, find out where that package is. And, like, the agent, is extremely tenacious and, like, took the image and, like, looked at a bunch of, like, listings around their neighborhood and figured out exactly the apartment complex in which the package was, like, gave them some information. And so, like, I think there’s all these things that, like, you, work-related or productivity-related things, I think that’s what we want the product to be. You asked about Codex. I think we think Codex is, a durable brand, but we have a principle that, like, the user we don’t want a user to get stuck in a tab or an experience where they don’t get the power of the product. And so, like, everything that you can do, in the Codex portion of the product on desktop, you can do in ChatGPT Work and vice versa. But we made some opinionated product decisions on, like, how much of the Git state, if you’re in a Git repo, do we wanna expose to the end user? Or how much do we wanna make the experience of seeing the agents thinking, like, diff forward so that you get exposed to the diffs out of the box. And then, like, on the safety side, like, how do we wanna think about, like, sandboxing and making sure that we have the right defaults in one state versus the other? So, there’s, like, some opinions that go behind that, but we do want We don’t want the user to need to choose which experience they’re in.
Swyx [00:09:26]: That is a good goal for AGI, right? Like, people don’t want, like, to hide to choose what version of AGI they want. They just want the AGI to decide for them. can I get an answer or, like It’s not super clear to me. Is the Codex harness and the ChatGPT Work harness the same? Is it just UI affordances, or are there prompt level or even deeper differences?
Shared Harness, Different UX: Codex vs. Work
Akshay Nathan [00:09:49]: So the harness is the same. The harness is shared. on In both of the products, we made improvements to the harness to make it good for knowledge work, especially as it relates to plug-ins or computer use or artifacts. You get that power regardless of which experience you’re in. On the UX side, there’s opinionated takes that we have when you’re in Codex mode, what the UX should be how the UX should behave, and some stuff around the sandbox like I mentioned, but the underlying harness and capabilities should be the same.
Swyx [00:10:16]: I’m just kinda curious. Maybe we can, -- Is there a query that we can run that would look different in the two modes?
Akshay Nathan [00:10:23]: Yeah. I tried to create, like ask it to create, like, a retirement calculator spreadsheet or something, in both modes. And then in Codex mode, you might have to be in a repo for this, but you’ll see, like, the diffs of, like, the sheet that it’s creating and stuff like that, and the file edits. But in Work you won’t be able to see that.
Swyx [00:10:42]: I think that’s, that’s super clear. And then also the other thing I wanted to dive into was your, the productivity team. what else is there? first of all, what are the top-level teams other than productivity? Isn’t productivity everything?
Productivity Teams and Core Chat
Akshay Nathan [00:10:55]: So
Swyx [00:10:55]: Science?
Akshay Nathan [00:10:55]: We have a team focused on ChatGPT. Like, the core chat experience, for consumer, which is like, not, I think all productivity. Like, there’People are using ChatGPT every day for search to, figure out how to write messages to loved ones, to think about, how to, like, learn a new topic, et cetera. And so there’s so much more inside to create images. And there’s so much more in chat that, the hundreds of millions of users are using that warrants, like, a very dedicated effort. And there’s teams focused on enterprise and infrastructure and API and stuff like that, so.
Swyx [00:11:33]: I will bring it up.
Retirement Calculator Demo and Git-First UX
Swyx [00:11:34]: Yeah. So I have them both running. This is ChatGPT Work. There’s a Codex version here. I picked “Five Little Ducks” song, so this will take a while.
Akshay Nathan [00:11:43]: Huh.
Swyx [00:11:43]: I think we’ll just keep it in the background and, as they finish, we’ll look into some of the differences.
Akshay Nathan [00:11:48]: Yeah. But immediately, I think if you flip back to the Codex version you’ll see that,
Swyx [00:11:53]: That it assumes
Akshay Nathan [00:11:54]: Like the
Swyx [00:11:54]: It assumes Git. Yeah. Yeah.
Akshay Nathan [00:11:56]: The, like, dynamic island assumes that you’re in a Git repo. And you might miss some stuff because some of it is, like, in the actual chain of thought with those changes and how we display that, but yeah.
Swyx [00:12:07]: Is there an unintuitive like, is there a thing that you wanted to ship and then you got feedback, and you were like, “No, let’s not do it?” Like, what’s the thinking behind that?
Why Merge the Experiences
Akshay Nathan [00:12:14]: In, ChatGPT Work?
Akshay Nathan [00:12:17]: I think one direction we could have gone with this is, like, keeping the experiences, like, completely separate. So it’s like, why
Swyx [00:12:22]: Different apps.
Akshay Nathan [00:12:23]: Exactly, like different apps or even in the same app, like different, completely different experiences. Like, why merge it all? Like, what is. Codex, people love. Like, why bring these products together? And I think the intuition here is that, like, all of our jobs are, like, changing dramatically with AI. Like, for, like, every few months, like, I feel like I wake up, and I’m, like, doing a completely different thing than I was doing a few months ago. And my hypothesis here is that, or I should say our hypothesis is that, like, part of what we’re, we’re building, this technology is giving people leverage. Like, the things, maybe it’s the more mundane parts of your job or parts that, like, if you were able to automate, you’d be able to share more ideas faster or whatever, like, you’re able to do now. And because of that, like, that might blur the lines between someone who’s, like, only writing code or creating strategy docs or, planning events or, helping with marketing or doing podcasts or whatever, right? And so, like, these things are gonna get blurred over time. And so, like, trying to draw a hard boundary based on, like, the who you are is gonna be, is gonna be tough. And, like, we should enable users to choose, but we shouldn’t box them in. And so a lot of the work that went in here, like, keeping the primitives the same, like for example, plugins are, like, unified across, this product and ChatGPT and the cloud, was because of that. It’s this thesis that, like, eventually things are gonna come together and we don’t wanna be Like, we wanna be prescriptive about when to be in either experience, but we don’t want to box anyone in.
Swyx [00:13:45]: I wonder if there’s users who are very tuned to the old ChatGPT harness that is effectively now replaced by the Codex harness. I can’t imagine what that was, but maybe they’re more the more conversational side. Can you compare and contrast the two harnesses? ‘Cause only you’ve seen it.
Akshay Nathan [00:14:02]: Yeah. I think ChatGPT, the existing harness, like, still exists today. Like, it exists in this app,
Harness Engineering: ChatGPT vs. Codex
Swyx [00:14:08]: The classic, right?
Akshay Nathan [00:14:09]: The
Vibhu [00:14:09]: You just start a new chat, and you don’t go under Work, right?
Akshay Nathan [00:14:13]: Yeah. If you start
Vibhu [00:14:13]: So
Akshay Nathan [00:14:14]: A new chat and go to chat, then you’re, you’re talking to ChatGPT with the instant model.
Vibhu [00:14:16]: Oh, we can technically do another. But on instant.
Swyx [00:14:21]: Yeah. So this one’s not gonna code or it’s gonna be in line. It’s on a in line in a sandbox.
Akshay Nathan [00:14:26]: It’ll
Vibhu [00:14:27]: Oh, that’s cool
Akshay Nathan [00:14:27]: We try to push you to go to Work if you’re creating a spreadsheet. Yeah, but this is
Swyx [00:14:30]: And this is a router decision? Sorry. Is it a router decision?
Akshay Nathan [00:14:34]: This is the decision that, the model is making, and then, like it sees that you’re able to. or you’re trying to do something that would be better served in Work mode. But I think your question was like, what are the advantages of, like, the chat, like ChatGPT chat harness?
Swyx [00:14:48]: It’s more broadly, like, I wanna, do an oral history of harness engineering. Right? the ChatGPT harness lasted us from, let’s call it the ‘01 era, until now, and now it’s being replaced by the Codex harness effectively. And they’re, they’re overlapping somewhat, but I’m curious what changed if there is.
Akshay Nathan [00:15:10]: My perspective on this is, like, there’s, there’s, there’s there’s like a constant process of, like, divergence, convergence, divergence, convergence. And in chat, like, many of the use cases I was talking about before, like, search or learning, I think we’re, we’re really optimizing for latency and optimizing for personality and, like, different things that, over time, like the product The reason people love ChatGPT is because we’ve been optimizing for those things and working on them for so long. Codex, what we learned was that, like, if you give the agent access to this infinitely flexible environment as a computer, it can do really powerful things. And so when we think about, like, okay, well, for knowledge work, like, what is which mode should we choose? It was like it felt more natural to us to bring that to this, like, computer environment and, maybe abstract some of the details of this computer away from users who might not be used to that, but, like, give them that same power. But ultimately, I think that we want the power in all places, right? We wanna meet people where they are. So I’m sure there’ll be work down the road in order to get things to be, equivalently capable in all scenarios. But it’s just a question of, like, what we’ve been focusing on the product on historically and what we’re focusing on now.
Models, Defaults, and the Reasoning Slider
Vibhu [00:16:24]: I think alongside that, outside of just harness and when to use Codex, ChatGPT, or Work, there’s also the new models you’ve released, right? any guidance there? So people love to min-max what to use, like only use Terra on high reasoning versus, for this, you wanna use Sol here, ignore all these
Akshay Nathan [00:16:44]: There’s 32 options.
Vibhu [00:16:46]: But, that being said, for people that are expanding, so, productivity trying stuff for work that don’t have the breakdown of what all this is what’s, what’s the advice, right?
Akshay Nathan [00:16:59]: Well, I think before the advice, like the first thing is, like, none of this would be possible without these models. Like, the, I think you asked earlier, like, what was, like, the inspiration for work and, like, early on, like I mentioned, like, what we were seeing with Codex, but that was also because the models were getting infinitely more capable. That’s happening again. I think it’s like another step function jump now. And to answer the question on advice, like we want this default to be the best possible. Like, we wanna be opinionated about the default, and so we’ve we’ve chosen a default that we think is gonna be the best for everyone. And, we have for power users options under the hood. We could One could argue that there might be too many right now, and we’re, working on simplifying it. But you can extend, the reasoning level, and you can change between the different model classes if you need to, but the default should be the best for most use cases. So my advice to most people would be to stick to that. And then, if you reach a situation in which you think that you could, you wanna try, a different configuration, if you’re not seeing either the efficiency on the cost side or the quality on the intelligence side, then you can change the defaults and see if you can get something better. But we think that the default should be good enough.
Swyx [00:18:09]: I have, I’m just gonna run something by you since you have way more experience than me. I’ve recently been doing Sol Lite but with goal, with the idea that the goal augments the reasoning effort, but with more terminations and turns.
Swyx [00:18:24]: Is that a good way to think about it as opposed to Sol Ultra or Sol, Extra High?
Akshay Nathan [00:18:29]: Yeah. It’s hard to say because
Swyx [00:18:31]: Yeah. It’s like an interaction effect.
Akshay Nathan [00:18:33]: exactly. It’s like there’s a preference on, for you as an individual, like how do you like to collaborate with the models? Like how many of those like terminations, as you call them, do you want where, you can steer or make sure that it’s doing the right thing?
Akshay Nathan [00:18:46]: I think generally people should try whatever works for them. I think that like using Ultra or the like multi-agent setups are best for like when you have like tasks that are either incredibly complicated, like open explorations or very paralyzable. I think even for tasks using goal, I think is best for tasks that you’ll be able to make consistent progress in a way that’s verifiable over time. But I think for most tasks, they don’t fall into either of those buckets. And so like at least when they’re starting, and so that’s why I think the best first step is like trying it with the default configuration and then seeing like where you wanna go from there.
Swyx [00:19:29]: Right. You guys worked on a slider, which is super helpful for reducing the amount of panic.
Vibhu [00:19:36]: It’s nice on mobile at least. There’s a nice slider there.
Swyx [00:19:38]: It’s nicer.
Vibhu [00:19:39]: I haven’t tried it.
Swyx [00:19:40]: So you have the advanced view there, but if you click advanced view. Yeah.
Vibhu [00:19:44]: Ooh, it’s just a nice slider. Yeah.
Swyx [00:19:46]: Very pretty, very colorful.
Akshay Nathan [00:19:48]: Yeah. The idea was here was like reduce it to like one dimension even though there’s multiple dimensions, right? Try to project it onto a single dimension for the user. Like, something from that represents like, speed and efficiency on one side and then like quality and thoroughness on the other side.
Artifacts, Spreadsheets, and the Work Launch
Swyx [00:20:04]: I am just puzzled that it uses Sol so much, like the lower
Vibhu [00:20:07]: No
Swyx [00:20:07]: Grounds I would’ve used
Vibhu [00:20:08]: I think the slider, if I’m not mistaken, is
Swyx [00:20:09]: Terra.
Vibhu [00:20:10]: Oh, it is.
Swyx [00:20:11]: Yeah. See? So they preset Terra to only be the light one. But like I think a lot of people would more people should use Terra. One, because Sol keeps running out of capacity.
Vibhu [00:20:22]: I’m the reason. Here’s ten minutes of our
Swyx [00:20:24]: There you go
Vibhu [00:20:25]: Retirement calculator.
Swyx [00:20:26]: Oh, that’s the Excel thing working for you.
Vibhu [00:20:28]: This is,
Swyx [00:20:28]: Oh my God. Look at that
Vibhu [00:20:28]: This is work, and then Codex is still cooking, so we’ll get back into it. I think it’ll be interesting to see the thought process, the reasoning, and also, this is eight minutes on work. Codex is still cooking.
Swyx [00:20:41]: Yeah. And by the way, so I’ve, do Gabriel Chua? He’s part of the OpenAI Singapore team. He showed me this, and I was like pretty shocked that this looks like Excel. It edits Excel files. You never paid an Excel license, right? Like, but somehow this is like workable and it’s agentic Excel.
Akshay Nathan [00:21:01]: Yeah. one of the big like pushes that we made for this launch was like artifacts, right?
Akshay Nathan [00:21:05]: Like both on the model side, like I think if you compare this with GPT-5.5 and GPT-5.4 before that, you’ll see that there’s been pretty dramatic improvements in the quality of these artifacts and then also on the product side.
Vibhu [00:21:16]: The UX side is also crazy, like hosted sites and whatnot. No longer needing to host your own little webpage, like it
Swyx [00:21:23]: Oh, I have a story about that. I can do, a separate thing. I’ll need to take the visuals here, but we-we’ll, we’ll cut to that later. Was there co-training, because you were moving making this big move and you launched GPT-5.6 on the same day as ChatGPT Work? Was there influence between the model training teams and the harness teams, or did they did the launch dates just happen to line up the same day?
Akshay Nathan [00:21:46]: I think the we collaborate heavily with the research teams, and I think that’s like one of the most magical parts of the job, like the most fun parts of the job. But yeah, just using artifacts as an example. Like, a lot of what you’re seeing, like underneath the hood, there’s a lot of work that went into making sure that like, we had the right infra to be able to train the models to get better at this. And then on the product side, like had the right experience for users to be able to collaborate with the model on an artifact like this. In fact, like this whole viewer, like the intuition here is that like, it’s not necessarily that you wouldn’t need an Excel license. This is stage one, right? Like, this is probably not what you meant when you’re like making a retirement calculator.
Vibhu [00:22:24]: Yeah, you can iterate very easily. Yeah.
Akshay Nathan [00:22:24]: You wanna iterate and like when you’re seeing it, and if this thing is high fidelity to like what you would see in or what your coworkers would see if you were to send this to Sean, like that I think makes it so easier and makes you trust the product in terms of iteration.
Vibhu [00:22:39]: When you say coworkers would see, do you see a multiplayer, multi-team collaboration with artifacts? Any things you guys think about that?
Multiplayer Artifacts and Collaboration
Swyx [00:22:46]: You can already share it, right?
Akshay Nathan [00:22:48]: Yeah. It’s inter It’s something that, we’re actively thinking about. one thing that, we’ve noticed internally without talking too much about the roadmap is that like there’s many times when someone will ping me about something, and I will ask ChatGPT Work the question, and then I’ll ping them back the answer.
Akshay Nathan [00:23:04]: And then I’ll be thinking like
Vibhu [00:23:04]: Like the simplest would be, the three of us are just all on one hosted.
Akshay Nathan [00:23:07]: Exactly. And I’ll think about like was I required in this loop or and then maybe it was, rephrase like what they were asking or pulled from certain context or whatever. But like, when I gave them back the answer, that process was also lossy, right? Like I gave them just like my interpretation of what ChatGPT Work cooked up. But like underneath the hood, there’s so much context like in the rollout and stuff that could be interesting.
Vibhu [00:23:28]: Yeah, it’s
Swyx [00:23:28]: So like the answer was preemptively respond to every inbound request?
Akshay Nathan [00:23:33]: No, it was just like literally like this is what I do sometimes as my job.
Swyx [00:23:36]: I know you copy-paste and then you’re just a message forwarding service
Akshay Nathan [00:23:39]: Yeah. Yeah, exactly
Swyx [00:23:39]: From AI to AI.
Vibhu [00:23:40]: But I think it’s interesting, right? It helps people understand the capability of what you can ask and delegate that oftentimes people don’t realize until they try or someone shows you, and then you’re like, “Oh, okay. Okay, I see.”
Swyx [00:23:52]: I think it’s als there’s also like a, light security issue, where like you’re the permissions layer. Like yes, I could query everything that you query, and I could get an automated response, but maybe I’m not supposed to see it. And that there’s no way I would know because I’m not supposed to know what I don’t know.
Akshay Nathan [00:24:07]: Especially as like, with ChatGPT Work, we’re, we’re asking you to connect your plug-ins and, it’s pulling from your local files and stuff like that. Like the amount of context that the agent has access to is like- Deeply personal and like that’s something I think we need to preserve, so that’ll be definitely a challenge.
Swyx [00:24:22]: There’s Excel, there’s PowerPoint, there’s Docs, the, grand trio of work. What other formats of work do you think about? like you worked on Airtable. Is there a future where there’s like OpenAI Airtable? Like what does that look like if you ever ended up doing it?
Akshay Nathan [00:24:41]: It’s a really good question. I think,
Formats of Work: Sites as Knowledge Artifacts
Akshay Nathan [00:24:43]: one that you didn’t bring up was Sites, and I think that was
Swyx [00:24:46]: Sites
Akshay Nathan [00:24:46]: A core part of this launch. There’s one side of Sites that I think people commonly talk about, especially on Twitter and stuff or X, of like, this like prototyping tool. And like we saw that happen with this launch even. The model slider that you guys were referencing earlier, like that was developed almost fully in a Site. Like, the collaboration between design and engineering and product on that was like on a site where we play with, the affordance and figure out how it feels and all of that. But the other aspect that I think is a little bit less talked about is like Sites as like an artifact for knowledge work. I was talking to someone the other day who’s on like our corporate finance team, and like we were mentioning how like now when they have these reports that they’re, they’re working on as a team month to month, historically those things were in slide decks and in spreadsheets, and now they’re just in Sites. And like Sites is the mechanism that they collaborate across the team. And the reason is ‘cause it’s like, it’s like somewhat higher bandwidth. Like, at these tools like PowerPoint and Excel are like infinitely flexible, but at some point you reach the boundary of like either as a human you may not know how to use some feature or something, or the product itself doesn’t support it. But with a site you can do anything. You ask for anything and you can get that. once people see that magic, I think it’s been really valuable.
Swyx [00:26:02]: Yeah, let me show you my case study. this involves all the hot topics including ChatGPT Work, but also GPT-5.6 token billionaires and token maxing and Sites and auto research. I’m a fan of this game called Strata. It’s, it’s like a little board game that you
Sites, Auto Research, and Research Dashboards
Swyx [00:26:17]: That you play with, physical blocks, that come on top of it like that. So over the weekend I took like thirty photos and just threw into ChatGPT. one point seven billion tokens later, out comes this site with a fully playable thing
Akshay Nathan [00:26:32]: Wow
Swyx [00:26:32]: With 3D, block placement and everything. Because it requires physical blocks and I needed friends to train on it so they can get better, so I can play against them. But also, I could also, do things like train an AI on it and that’s, that
Akshay Nathan [00:26:45]: That’s your auto research
Swyx [00:26:46]: That gets into auto research. So, you want to train your own AIs, and then make sure they self-play against, each other. I need to set both AIs. So this is AI versus AI, and they’re, they’re gonna self-play. the AIs start out bad and then you want to define a loss function and get good. I wasn’t gonna supervise all this. I was at, I was down in San Mateo, attending a conference. What I ended up doing was, auto researching and on this and creating benchmarks and that there was just way too many parameters for me to read. So I started asking it for a site, and it’s created this lab, panel. Where is there a, is there a shortcut for a site that is created?
Akshay Nathan [00:27:28]: You should be able to go in the sidebar to Sites, top of the sidebar. The left sidebar.
Swyx [00:27:33]: This one? Oh, left?
Akshay Nathan [00:27:35]: Yeah. Just scroll all the way to the top.
Swyx [00:27:36]: Oh. Oh, it says Sites. Oh, there you go. Yeah.
Akshay Nathan [00:27:39]: Ooh.
Swyx [00:27:40]: So it create, it creates the sites. I don’t, I don’t think this is, it is exactly what I wanted, but let me show you what it popped up, right? Like I think as a research artifact, it is very important to communicate, exactly, what is being done. Outputs this thing which I eventually started publishing. So I moved it off of Sites because I wanted more, database and infrastructure than Sites afforded me. But this is like a research output that you can start to mess with and like try to think about like what hyperparameters are you tuning for training AIs. And like I was trying to make like scaling laws and everything and doing all sorts of like game optimization stuff. And the fact that you can just throw this up as a research artifact, like I no longer need to read ChatGPT output. I read Site output. But then there’s also a huge sprawl. Like look at how long this thing is. There’s so many numbers. It is pretty overwhelming, so then I have to start pruning it from there. But, it’s an interesting transition from Markdown effectively that you’re putting out to, you’re putting out a whole functional site.
Akshay Nathan [00:28:41]: I think Markdown just isn’t that optimal for people to read, right? Might as well just write HTML website and I don’t know. I think you can do a lot with customizing this, right? You have your skills that explain what you want. Like I noticed they’re quite verbose. I don’t need a lot of this information.
Swyx [00:28:57]: It’s very verbose.
Akshay Nathan [00:28:58]: So and then the nice thing of having a site side by side is, you just iterate on what you want and what you don’t, right?
Swyx [00:29:05]: Yeah. I don’t know if, any that triggers any stories for you of how it’s run internally. Am I doing this right?
Akshay Nathan [00:29:11]: Yeah. I think that this is like a workflow that we’re seeing like all different types of teams use, where like the canonical artifact that was previously a deck or something is now becoming a site. And like with a site you, because it’s just HTML, you can like. It’s infinitely flexible. And so, if you want to give more prominence to a certain thing that like in a slide deck would, feel like it was buried, like you can do that. You can have it be like the hero image, right? And so I think that like, people are starting to see that. There’s more work to be done to make these things like much more easier, easy to collaborate on. You mentioned that they’re very, they’re long and verbose, could be broken up. I’m sure that there’s still something to do there.
Swyx [00:29:53]: They’re super long. Yeah.
Akshay Nathan [00:29:54]: Yeah. But I think we’re starting to see that like there is this aspect of this is a really interesting, format, for people to use, that’s like much more flexible than what they ever had before.
Swyx [00:30:07]: I think your job also comes becomes meta. You’re not designing the products. You’re designing a product to make products, and I’m curious how you manage that.
Designing a Product That Makes Products
Akshay Nathan [00:30:18]: I think one thing that we’ve been Like when we look at the UX, like that we’ve been thinking a lot about is how can we balance like simplicity with capability? Like if we’re designing a product, like you said, that like is made to make up build other things, right? You can build so many different things. But we can’t put that all in front of you because you’ll get overwhelmed.
Vibhu [00:30:41]: Yes.
Akshay Nathan [00:30:41]: And so we had similar problem or similar challenges even Chat-with ChatGPT, but especially now, like when there’s so much that can be done, I think the balance that we’re constantly trying to strike is like, how can we give the user enough of a UI surface where, they can be expressive, they can tell the agent what they need, they can verify that it’s using the right tools, it’s pulling from the right sources, et cetera, but then it gets out of the way. And then how can we build the right system such that we can show them instead of telling them what can be done? Because so much of this is gonna be like, how do they discover the next use case and the next one after that if they really want to be super powered by the AI.
Games, Private Evals, and Show-Don’Tell
Vibhu [00:31:19]: Yeah. It’s interesting. I feel like everyone also just has a different way to do it, right? I made a similar version of this same game. I didn’t take any pictures of board or rule game. I threw in at goal eighteen minutes, fifty-three seconds later, a lot of tokens later, I’ve got a similar version. not with all the auto research and whatnot, but
Akshay Nathan [00:31:39]: You gotta do all the latest trends.
Vibhu [00:31:40]: And yeah, I did it with, did it with Codex, not Work, but it’s interesting, right?
Akshay Nathan [00:31:45]: Yeah. And this is GPT Image generating the pro avatars. Very good for game design. Like
Vibhu [00:31:51]: And
Akshay Nathan [00:31:52]: A lot of game designers were like really into GPT Image for assets.
Vibhu [00:31:54]: I will say like the broader takeaway probably is the reason that we do this is more so just to test the tools, right? Like, this was also a test for GPT-5.6 came out. I had done the game on GPT-5.5, right? The ability for me to no longer need it to. I had to feed it the rules. It’s, it’s a pretty niche game. It couldn’t find how to do this on its own.
Akshay Nathan [00:32:15]: Oh, yeah.
Vibhu [00:32:15]: GPT-5.6
Akshay Nathan [00:32:16]: It is out-of-distribution, which is why I was also very keen on testing the GPT-5.6 capability.
Vibhu [00:32:21]: But, this is just as work comes out, as new things come out, these are just our side ways to test things, right?
Akshay Nathan [00:32:27]: Yeah. It’s some private eval. That is not this private.
Vibhu [00:32:31]: But also valuable because now you can send this to your friends and I learned about this game through seeing this.
Akshay Nathan [00:32:36]: It’s a hard game. He’s very good.
Vibhu [00:32:39]: It’s good to when no one is competing with you. But yes, it’s a classic RL problem of like self-play, bootstrapping your game AI. yeah, you see how easily work becomes personal and personal becomes work because the thing I do for personal, it directly informs people I work with because I showed it to them. They were like, “Oh, you can do that with GPT?” Which like I imagine is the growth strategy.
Akshay Nathan [00:33:02]: Yeah. The show not tell is a big piece that, I think we’ve we’re not still not fully cracked of like, showing people all the things that they can do with the product versus like trying to teach that to them through like, articles or onboarding or whatever.
Akshay Nathan [00:33:18]: So meeting them in the moment.
Vibhu [00:33:19]: It’s a career risk for me, because I used to be in developer relations, right? Where your job is to show, and then you’re like, “What do you mean? You don’t, you don’t need.” your job is to tell. And then. But the product people are like, “Well, we don’t need you if our product is intuitive enough.” So
Akshay Nathan [00:33:37]: Yeah. that’s the magic of the models. So you can tailor the telling or the showing to like specifically what the user needs, like what they care about, what they’ve done in the past, exactly where they are on the adoption journey. So I think that’s like gonna be a super big opportunity.
Vibhu [00:33:50]: Seems easier and easier now to tailor custom showing, right? People have different use cases. As much as you said you don’t wanna segment different people into different buckets, right? It’s also not that hard to for people that are in different categories. But the question, is you said your team is more broadly on. What was the term you used? Productivity?
From Developers to Knowledge Work to Everyone
Akshay Nathan [00:34:12]: Productivity.
Vibhu [00:34:12]: Productivity. So how
Akshay Nathan [00:34:12]: Which is now work.
Vibhu [00:34:14]: Is it work? Is there another distribution that we’re not hitting? Is there a group of people that will have something different than ChatGPT, Codex or Work? Is there more that the mass isn’t targeting?
Akshay Nathan [00:34:28]: I see it as like a sequencing, like. The vision is like bring useful agents to everyone. We started with like developers. Like developers historically are like early adopters that are willing to put up with more friction, set things up, et cetera. Like that’s where, Codex started. I think the next opportunity is like what we call general knowledge work, all the other functions around developers. I think when you go from developers to this segment, like there’s inherent challenges with like, this show not tell thing that we’re talking about, making the product more understandable, bringing in new capabilities that matter more for this cohort than matter for developers, things like artifacts, things like computer use, et cetera. And then I think like the same learnings, like similarly how we took the learnings from developers and brought it to, general knowledge work, the next stage will be like taking the learnings from general knowledge work and bringing it to everyone no matter what they’re doing in their lives. And we’re already seeing that a little bit. Like this game example that you have is, something that’s like on the border of like fun and personal life to, your professional life. I use ChatGPT Work full-time at home for everything, like for whatever I’m doing. I used it the other day to come up with a meal plan and like, save that on the like computer environment that it has and something that I can continue going back to. Like is everyone doing that yet? Probably not because the thing says work on it, but eventually, we wanna get people there.
Vibhu [00:35:51]: ChatGPT life.
Akshay Nathan [00:35:52]: Yeah, exactly. ChatGPT cooking. But I think there’s a lot of, there’s a lot of opportunity there, but I see it as like, we’re, we’re built we built a foundation in software engineering, and we’re gonna take the same learnings that we take from software engineering to knowledge work to everyone.
Vibhu [00:36:07]: Do you have any power user advice? I feel like, there’s a group of people that will live it, use it for everything, stay on it twenty four-seven. And then there’s a bit of a gap between that crew and people that, okay, I use it for work. I use it occasionally. Sometimes I type questions. any advice, any learnings, anything you recommend or just, takeaways that you’ve found that help bridge that gap?
Power User Advice: Push the Frontier of Imagination
Akshay Nathan [00:36:30]: I think a couple things that I’ve seen is like, one, that it really helps to broaden your imagination of what’s possible, and this has been a learning even for me. Like, the technology has progressed so fast that, something that, like, even three months ago, like, no way the models can do this. Like, now it’s like, wow, it’s like it can. Like,
Swyx [00:36:52]: Give an example
Akshay Nathan [00:36:52]: We’re going through right now our, like, review cycle internally, and, people always talked about this as, like, a thing that the models are good at and like, there’s a cliché of like: Okay, like, no one wants to be writing reviews and, like, we just use AI to do it. But in all seriousness
Swyx [00:37:09]: And it can evaluate it as well.
Akshay Nathan [00:37:10]: Yeah, exactly. In all seriousness, before it was, like, just, like, slop and, like, I think it was helpful, but, not super productive. Now I’ve found that, like, the model can do a much better job than me, especially in this environment of, like, pulling context on, like, what people are up to, how they’ve like the things that they’ve done to make a difference, highlighting like, wins that they’ve had that, like, I might may not even have seen. It has access to, like, everything, right? Like the code, like, things that they’ve caught, reviews, Slack, everything. And so it’s, like, incredibly powerful in that domain and, like, just like six months ago, the last time we did this cycle, like, I didn’t even I tried using it, but it was not at all helpful. And this time it’s been, like, incredibly helpful and, like, so I think continuing to push the frontier of imagination of what’s possible, even if you tried something before, I think is maybe the my biggest piece of advice. The other, thing is, like, the more you put in, especially in this environment where, like, the model has access to everything on your computer or in ChatGPT Work, like you can create, artifacts over time and save them in your library and, like, the model will continue having access to those. Like, the more information you give it about whatever domain you’re in, whether it’s your life or your work, the more valuable it becomes, and it’ll become valuable in, like, ways that might surprise you. Like, it might pull from context in a way that, may be proactive and that you might not even have thought about. But it needs to have access to those, to that those tools or that context first.
Reviews, Agentic Search, and Context Gathering
Swyx [00:38:27]: One thing I just wanna talk about the review stuff because I’m still that’s a very sensitive thing and you’re, you’re a founder, you’ve managed people, you’ve hired people. As manager myself, I’m very reticent to put out any LLM-generated things especially when it comes to people, ‘cause it feels like you don’t care.
Swyx [00:38:46]: Presumably at OpenAI, people are more open to being eval rated by GPT. But are there any unofficial rules around this? Like, what’s the etiquette?
Akshay Nathan [00:38:57]: Oh, I think the etiquette is that, like, I would never write something via, like, well, solely via AI and, like, present it as, like, a review for someone. What I was talking about is more, like, gathering context. That’s the place where it’s incredibly helpful.
Swyx [00:39:08]: So it’s just search.
Akshay Nathan [00:39:09]: Yeah, exactly.
Swyx [00:39:09]: It’s agentic search. Yeah.
Akshay Nathan [00:39:10]: It’s like agentic search, but, that you can tailor and steer much more capably than you could before, ‘cause, like, the thing is it’s all there’s a flywheel happening, right? Because of Codex, people are able to do, and because of ChatGPT, people are able to do so much more now than ever before. And if you’re able to do so much more, it’s easy to miss things as well. And so, like, I think we need to use these same tools to keep up with all the impact that people are having and understand, where we can be helpful.
Swyx [00:39:39]: I think the thing, like, I run a small company, so easy to search, but at the scale of OpenAI with the amount of messages that you guys put in Slack, do you think that it misses things?
Remembering What Humans Miss
Akshay Nathan [00:39:50]: Probably, but I think that I also miss things.
Swyx [00:39:52]: Like, it doesn’t matter, right?
Vibhu [00:39:53]: I think sometimes it’s
Swyx [00:39:53]: Like it’s, as it needs to be human-level
Akshay Nathan [00:39:54]: It’s all relative, right? Yeah.
Vibhu [00:39:56]: Sometimes it’s nice when it finds things you wouldn’t, right? Like right now, my Codex system prompts, they’re set up in such a way that every project I have has a secret- separate, notes MD, and it just writes learnings to there. And then the global one can pull from all these. So sometimes it’ll be like: Oh, there’s this project you did like four months ago. Here’s a note that we had, and it randomly pulls it back into context that I would never do, I haven’t thought about.
Vibhu [00:40:20]: And I’m like, okay, this is quite superhuman, right? Like, stuff that would. And, it’ll save like hours on chunking of stuff or find something that’s already been done. I’m like, as much as it might miss stuff, I would too, but it’s very useful when it finds stuff. And I have like a very, non-super engineered solution to this. It’s just marked down files that get pulled whenever they want.
Akshay Nathan [00:40:41]: Yeah. I have a funny anecdote about this. Like, recently gearing up to this launch, the team has been, really cooking on it for a couple months, and over that time, like there’s so much conversation and chatter going on in Slack and Docs and elsewhere. And, one of the members of the team set up this, scheduled tasks, like automation to like look at everything that’s going on and, like, come up with the best memes and then post it in one of our shared channels. And like, there are two cool things about this. Like, the first is, like, I think the models are, over time, like starting to become like funny.
Swyx [00:41:13]: Funny. Nice.
Akshay Nathan [00:41:13]: Whereas like, a year ago, like that was not at all the case. The second is, it was what you were saying, like they find things that in surprising ways that you may not have thought of and like create connections that you may not have thought of. And that really helps with like the meme generation because then you can see something that, genuinely surprises you and, is funny in that way. So yeah, that’s like not like the most productive, use of this the technology, but it does it does uncover this, like this capability that’s emerging, which is just like to find information that you otherwise would not know of.
Launch Momentum and the 10 Million User Milestone
Swyx [00:41:43]: Talking about the launch, I think, I have pretty much said this is the most successful launch in a long time. I think even more successful personally than 5.0, and they’re announcing ten million users. Does it feel different? You’ve been through a lot of launches.
Akshay Nathan [00:41:58]: I think it feels like a culmination. Well, I think two things. One, it feels like a culmination, like I was mentioning earlier, like this like vision mission that we’ve been on for a long time. Like I said, we saw the magic of Codex internally, and then we’re like extremely excited to bring this to many more people and to see it working, to like see us reach, the distribution goal, numbers that you mentioned, like I think that’s like huge and super exciting. The flip side of that is like, there’s so much more to do too. Like, that’s also really exciting. Like, ChatGPT as a whole, like the this product that, everyone almost equates to AI and like loves, has hundreds of millions of users. And so like ten million is really cool, but like we need to get this to everyone. Like, we need everyone to feel this magic. And so that’s the next step from here. But yeah, I think extremely pumped about how it’s going so far and the opportunities.
Swyx [00:42:46]: Awesome. I did want to also Because I’ve, I’ve, I’ve been tracking the number closely, it transitioned at some point from just Codex users to Codex plus ChatGPT Work, because they’re same harness. The whole point is that you don’t, you can’t, count them separately. Do you have roughly a billion, ChatGPT users? Why did it just jump to one billion right away? Like, isn’t that the default on ChatGPT or no?
Codex, ChatGPT Work, and the Developer Brand
Akshay Nathan [00:43:11]: We don’t default you into ChatGPT Work if you’re on ChatGPT
Swyx [00:43:14]: If you’re free. Yeah
Akshay Nathan [00:43:15]: It’s also only available to paid users right now. And I think there’s like a process of, educating users of what is the value of this product, having them try it, learning from their feedback, and making it better over time. But the goal is to, get as many of the people who love ChatGPT today to like feel the power of ChatGPT Work. But I think it’ll be a journey.
Swyx [00:43:36]: Yeah. And Codex will still be alive as a brand for the foreseeable future. And we’ll just toggle between them as needed for UI stuff.
Akshay Nathan [00:43:44]: Yeah, I think it’s even stronger point than that. Like, I think we fully intend to like, treat developer. Like, developers have been, a core market for us for so long, and like there’s, there’s so much more that we can do to make Codex great specifically for, software development, and we’ll continue to do that. This doesn’t take away from that at all. If anything, it should increase the utility of something like Codex, because now you can move seamlessly between writing a diff to creating an artifact or, doing a search over your factor.
Swyx [00:44:11]: I do wonder how much this terminology leaks to the non-technical user. Like, do they have to learn to say artifact if I want artifact? Or.
Akshay Nathan [00:44:20]: It’s funny, like we call it artifacts internally ‘cause that’s what the teams call it.
Swyx [00:44:23]: It’s nice. Yeah.
Akshay Nathan [00:44:23]: But like externally, like no one says that, no one calls it an artifact. But I think that people like often, like describe things, whatever they’re used to, right? So if, ChatGPT Work is good at creating slides, they’ll say ChatGPT Work is good at creating slides, and that’s what we want.
OpenClaw, Personal OS, and Persistent Computers
Swyx [00:44:38]: One big Another, it’s July of twenty-six. One big thing that also happens in, for OpenAI was OpenClaw, and that’s I think a lot of people’s first time really maxing a agent for personal stuff, but also crossing over to work in essence same way. As far as I understand, OpenClaw is still independent, but did you go through your own OpenClaw moments? Were there any lessons you took from OpenClaw to Codex or back? Whatever.
Akshay Nathan [00:45:06]: I think there’s a lot of inspiration. I did go through my own OpenClaw moment. I,
Swyx [00:45:10]: Yeah, tell the story
Akshay Nathan [00:45:10]: Me and my wife like set up an OpenClaw to like try to manage everything in our house. Not that there’s like a ton, but it was like quite useful. We gave it a calendar. It started, creating events for us and stuff. At some point, the laptop that we were running on, it died and never got a chance to pick it back up. But there was a lot of inspiration there, like, in ChatGPT Work, in web and mobile, like you get access to this like persistent computer environment where, you can store files, and those files stay around between sessions. And the idea is to be able to enable use cases like this. one of the members of our team uses ChatGPT Work for what they used OpenClaw from before, and then feel like it has like completely transitioned, which is like, workout planning and like meal tracking. which again, it’s like a work-related thing, right? It’s like not work necessarily, but it’s like in personal productivity space. But it has all the same primitives. So it has scheduled tasks. It has the ability to store files on a file system. It has the ability to like reference those things over time. And so you start to see the same types of use cases emerge, which has been really cool.
Swyx [00:46:14]: Is there a point that ChatGPT Work completely replaces OpenClaw? they’re independent, so.
Akshay Nathan [00:46:20]: Yeah, I’m, I’m not close to it, so I can’t speak to the OpenClaw roadmap, but I don’t think so. I think that there’s gonna be, there’s always a need for like this like incredible, like open source technology that team has built. And I think that we can draw inspiration, in the product and, ChatGPT, I think many more people have like heard about and used ChatGPT than have used OpenClaw. And if we can take the magic from OpenClaw and bring it to them, I think that’ll be a success. I think that like one thing on the ChatGPT Work side that we feel strongly about is that like the core experience is that you come to this product and you have a conversation, start a session, whatever you wanna call it, with this agent. And the magic of the product is that you can do anything in that moment. And we would like to create a product where you don’t have to click a button or to go to a different place, whatever, and you can get whatever functionality exists in, your finances app or where or any other product like in this one place. And so that’s the goal. It’s like it we want an extensible system with plugins where you can connect to the tools that you need in order to be able to accomplish like a financial task, where you can, if you’re doing like science work, like we have an ability to like extend the system in such that you can like write the tech and it performs well. There’ll always be like products that we support that are best in class at those things, but we want as much of the magic as possible in that core experience.
Swyx [00:47:45]: Yeah. Do you think that you can do everything you used to do with Wealthfront in ChatGPT Finance?
Finance, Data Access, and Centralized Context
Akshay Nathan [00:47:50]: I tried it. like ChatGPT doesn’t yet custody, cash and assets for me. So that part, no, not yet. But I, there was like a whole component of like retirement planning and, like financial planning and budgeting and stuff that, we were looking into when I was there. And like with the finances plugin, like that’s all possible with ChatGPT today. So, I feel like at least that component’s replaced for me.
Swyx [00:48:17]: I haven’t really plugged it in yet. I’m somewhat scared to look at the answer. Like that’s honestly like the same reason for health and finances. Like I’m like, no.
Akshay Nathan [00:48:27]: It’s really good. It’s really cool how we were talking about like the agentic search aspect a little bit earlier, but like, it’s really cool how like, in conventional UX, like if the more power you wanna give to a user, the more like knobs and bells and whistles you need to add. Like, for like these finance and budgeting apps, like there’s always like a bunch of the different filters and like search bars and stuff like that. But like now, like with the right
Vibhu [00:48:48]: Connect-connectivity to the right data, you can have whatever you want. You can ask any question you want and into that box and get the answer, and I think that’s super powerful.
Akshay Nathan [00:48:57]: I think it’s also nice to just have it centralized in one space, right? You have different health apps. I have one for a smart scale, a watch, all these different things. It’s just nice to centrally co-locate it.
Vibhu [00:49:08]: Which is, part of the whole thing of OpenClaw, right? Like that you would have, personal OS, which presumably ChatGPT wants to become. I do think that just relying on, like, just-in-time pulling of data for, let’s say, through via MCP, CLI, API, whatever you do, still not enough. Like I come from a bit of a data engineering background, like you still want like a data warehouse or some caching or semantic layer. do you feel that or do you already have that?
Akshay Nathan [00:49:40]: I can’t speak to like all the details on how everything works, but I think it depends on the access pattern, right? Like if you want an answer immediately, then yes, it’s very difficult to do that if you need to pull from all of these sources. But a lot of the like use cases that we wanna enable in ChatGPT Work aren’t necessarily something that you need immediately. It’s more like a task that you want the agent to go and do, and that’s gonna take a certain amount of time. And, with things like programmatic tool calling and stuff now, like some of that time and sub-agents and stuff, like some of that is also parallelizable. And so it’s possible I think it’s very possible that there’s a, the ceiling on what can be done, with MCPs and like calling out to these third-party services has been raised substantially. So we’re really excited about that.
Sub-Agents, Ultra, and Product Design Tradeoffs
Vibhu [00:50:23]: You mentioned sub-agents. I gotta double-click on that. Ultra is a new mode. You have special affordances in ChatGPT itself to show off the agents. Can’t really do much with them, to be honest. Like just watch. what have been, what have been your experiences, any design issues that you would call out to other builders building with sub-agents?
Akshay Nathan [00:50:45]: I think it’s goes back to the balance that I was raising earlier about like, showing builders the power of the tool, but also creating enough of an abstraction to not overwhelm them. I think with sub-agents, the thing that we wanted to show is that you can take a task that, has many parallel tracks or, is complicated in a way that, sub-agents can handle, and this product is for you. Like, the model can accomplish those goals or try to accomplish those goals. And so like that’s the point of like showing them in the product and that’s where we-we’ve gone with the design. There’s another, iteration of this where like you can see exactly what they’re doing and things like that, which I think is like, could converge on like overwhelming, with information. And so this is like the deliberate trade-off that we made for now.
Vibhu [00:51:33]: You do display quite a lot of transcripts.
Akshay Nathan [00:51:35]: Right. Right.
Vibhu [00:51:36]: Or do you
Akshay Nathan [00:51:36]: I think it’s hidden by default though, right?
Vibhu [00:51:37]: Do you want to display more than that?
Akshay Nathan [00:51:38]: No, it’s hidden by default. Yeah.
Vibhu [00:51:39]: Some people could want more. So I’m one of those people that will throw a lot of stuff at goal, and pretty much every goal I’ll tell it to use sub-agents. Seems redundant, right? But every time I’m like, “Okay, use sub-agents where possible.” And I have a lot of people, a lot of friends that recommend and do the same. Whereas I’ll sometimes talk to people that are like, “Okay, this is where I want you to use sub-agents for this sub-task,” and I’m sure they would appreciate seeing into how they’re being used. For me, it’s primarily like two things, right? One is net time efficiency, so span out across sub-agents. Two is probably cost, right?
Vibhu [00:52:15]: Don’t use big, expensive model. Offload to a lot of smaller, cheaper models. And some people want that level of control. So if you have repetition in what you’re doing, right? Say I want something built where I want it to consistently do this every day, I might wanna go in and fine-tune sub-agents here, sub-agents there. So you can see both, but I think if I’m not mistaken, it’s hidden by default. There’s a dropdown that goes a lot where I’m like, okay I’m just gonna keep, using.
Akshay Nathan [00:52:41]: Oh, you can change the model that they use.
Vibhu [00:52:42]: I know I tell them to be steered. I’ll say my I know Anthropic offers this in Cloud Code. You can tell Fable to use Sonnet or Opus to use Sonnet as sub-agent, so pretty trivial thing. You tell it to span out sub-agents with Sonnet, it’s cheaper, faster. I would assume if it’s not there, it could be built there. But I think there’s a side of
Akshay Nathan [00:53:02]: It’s too many toggles.
Vibhu [00:53:04]: It’s not a toggle. It’s just, you tell it in chat.
Akshay Nathan [00:53:07]: You’re prompting it. Yeah.
Vibhu [00:53:07]: The way I do it is prompt it, right? And I think this is something that gets abstracted unless it’s something you built for repetition, right? So if I’m building something, say that’s, podcast prep, right? Research into people, do a very deep extensive research, that I might wanna configure to cheaper, faster model just for web search, right? I can see a world in which you want both. I think the default is pretty good right now, where it’s hidden, but you can drop down and get some more info into what’s done.
Vibhu [00:53:34]: I know people talked a lot about it on GPT-5.6’s launch. this thing loves to use a lot of sub-agents and causes the ChatGPT app to just crash because it’s so processor-heavy. But,
Akshay Nathan [00:53:47]: For what it’s worth, that’s not my experience. Yeah, I haven’t had a crash from sub-agents.
Vibhu [00:53:52]: I haven’t either. I have We both have big laptops. But I know people brought it up. There was a topic of discussion that we didn’t see the same, but it is another vibe eval, right? People are like, “Okay, the amount of sub-agents Sol is wanting is crazy.” And I’m like, “I think this is okay. I think it’s good.” But just stuff people bring up.
Akshay Nathan [00:54:12]: I think when we launched the product too, we weren’t as opinion about like who is Ultra for and like when should they be using it. And since then we’ve made some changes to like, require you to turn it on and find it in the advanced setting ‘cause that’s who it is for. It’s for like power users who understand what’s gonna happen because it also, depending on your use case, can use more of your limits as well.
Vibhu [00:54:33]: Yes.
Akshay Nathan [00:54:33]: So that’s where I think a lot of the feedback was coming from.
Vibhu [00:54:36]: It’s okay. Reset the limits. Always reset the limits.
Akshay Nathan [00:54:39]: Well, it’s, today we’re resetting because of this. I wanna change topics to one last piece of the harness, memory. A lot of people are commenting on memory recently. ChatGPT’s new memory system used to suck, it’s not very good. And then this guy also the same thing, and Samir, who you presumably work with
Memory, Chronicle, and Personalized Context
Akshay Nathan [00:54:55]: Talking about memory. What can you say there? I think that, Samir and the team have made a ton of and then the research teams have made a ton of, updates and improvements over time. I think when I talk to friends, family members about what they love about ChatGPT, like the fact that it knows them, that they feel like their ChatGPT is their ChatGPT, I think comes up probably number one. In ChatGPT Work, in the Cloud, like by default, all conversations like inherit from your ChatGPT memory, so you’ll know they’ll know context about you, and they’ll also be able to write back to this memory.
Vibhu [00:55:27]: With it, like a small text write. Like you tell me when you’re writing, right? Is it
Akshay Nathan [00:55:31]: No, it’s part of the same like memory V3 system that we launched.
Vibhu [00:55:36]: Yeah, Memory V3, yeah.
Akshay Nathan [00:55:37]: So I think that’s been really powerful because, going from ChatGPT to ChatGPT Work feels like an extension of what I’ve already been doing with the product for sometimes many years. So that’s been awesome, and it’s awesome to see that like people are recognizing the improvements here.
Vibhu [00:55:51]: Is there So it’s a retrieval problem, right? Like, are you retrieving the right things? Are you over-focusing on the wrong things? Is there like a more false positive or false negative, if that makes sense? Like, what’s the bigger problem?
Akshay Nathan [00:56:05]: So I don’t work on memory directly so it’s hard to say what the bigger problem is with like certainty. But I think you’re right. I think that like, the there’s two sides of it. It’s like, making sure it knows things about you, but then also having the EQ to like bring those things up at the right moments proactively or surprising you in ways that are positive, not negative.
Akshay Nathan [00:56:21]: So I think it’s a very challenging problem, but something that I think we feel very there’s a huge opportunity to get right, which is like why we’ve made like big investments in it.
Vibhu [00:56:29]: How do you see the side of, okay, when you’re building ChatGPT for work different than the regular chat app, different than Codex, managing memory across different projects, collaboration and whatnot, how do you see the side of what’s separate from the harness, right? So if I have four threads on one project any learnings on how to build memory systems there? For background as well, to steer it a bit, is when you do chat style applications, I’d say you have a lot of one-offs, right?
Vibhu [00:56:58]: When you switch to work it might be something you’re doing for a month, something you do a lot, right? Now, as I add more sessions, there’s a lot more than just single-threaded, right?
Vibhu [00:57:08]: And there might be memory there.
Akshay Nathan [00:57:10]: I think first I challenge that like the depth of the memory or the like value of it is like fundamentally different across chat and work. Like it is true that like, there are a lot of like shorter sessions on chat, but I think, the ChatGPT, the product has had like a ton of longevity, in, as long as this technology has been around and people use it for work-related, like productivity-related things already today. And so I think we found that there’s a lot of value. I found this my personal usage, like all these one-offs add up over time into something like quite durable and like quite a good representation of who I am. I know like from time to time, something will go viral on X about like, ChatGPT telling you everything it knows about you, and people are always surprised like how deep that is.
Vibhu [00:57:55]: The fun roast me?
Akshay Nathan [00:57:57]: Exactly. So like, I think like the That’s all to say that like I think there’s a lot of depth there in the existing, ChatGPT product, and so that’s why I think we think it’s valuable to bring into the work product. But the other reason I brought that up is because I think like hopefully we can use some of the same fundamental primitives and systems to extend memory here as well, and I know this is something that the team that focuses on this is like working through right now.
Vibhu [00:58:20]: I wanted to bring up one element of memory, which I honestly don’t really use much, and I’m curious if you do: Chronicle, which was, is up on screen right now. It’s a super memory or like what is it?
Akshay Nathan [00:58:33]: I think the idea is that like it can learn from, how you’re using your computer and like it’s another input source, into memory. And, I think it’s, experimental right now and something that like isn’t default off. But I’d recommend that you try it. I think that it’s like quite interesting how It goes back to a conversation we were having earlier on like, you were asking like, “Does it Can ChatGPT miss things?” Like does it, on Slack, when it’s searching, does it miss things? ‘Cause there’s such a volume of stuff, right? And like it’I, you can ask the same question about like everything that you’re doing on your computer. Like, is it gonna know everything that you’re doing? Is it gonna capture the intent and stuff like that? Probably not, but like it probably will find things that you might not know about. And then if it can surface those to you in relevant times, in proactive ways, like when you’re doing tasks, and I found at least that it can be quite helpful. So it’s worth trying.
Vibhu [00:59:24]: So mostly for insights and longer term.
Akshay Nathan [00:59:27]: Yeah, exactly. Like insights and it builds context that makes, that can make you more productive on certain tasks. But it’s, it’s hard to describe without feeling it.
Vibhu [00:59:37]: I will say you can feel it pretty well. Like the idea of what they’re saying here, right? Just check through my memories or check through my logs and add skills. Pretty underrated, right?
Akshay Nathan [00:59:48]: But that’s automations. You can repeat that using a cron job. Checking through your memories and creating skills. But I think the creation of the memories from Chronicle itself is like what’s different. It’s like you have much deeper memories because you have Chronicle on.
Vibhu [01:00:01]: It’s there. I don’t use it much, but maybe I just, I need more examples. I imagine you guys use a lot of it internally, so I’m always fishing for use cases.
Akshay Nathan [01:00:10]: I would just try turning it on and then like
Vibhu [01:00:13]: It just auto works? Like it
Akshay Nathan [01:00:14]: Yeah, and seeing like where it might start helping you. I think you’d be surprised.
Vibhu [01:00:18]: Yeah. Amazing. I think that was, about it in terms of like the overall, coverage of ChatGPT Work. I think there’s been a lot of like good progress and discussion on building and all these things. There’s a lot of like ex-founders in the community, in OpenAI as well. Do you think that things have changed a lot? like your overall reflection of building, pre-AI and post-AI.
Akshay Nathan [01:00:44]: I think things have changed a ton. I think it’s like super exciting to see how quickly you can go to, from idea to something real today. whereas like even before, like I think, five, 10 years ago, like it’s fast if you were scrappy and, like, willing to build the minimal viable thing. But, like, now the extent of what you can build is, like, much broader. And I think that also, like, what we’ve seen internally building is, like, that gives you an opportunity to validate much more quickly, to talk to users, to talk to internal doctors, et cetera, and, like, make sure you’re on the right track. And, like, that loop I think has been has become more closed than ever before, and that’s, like, a win for product development. I think it’s a win for consumers and users too because ideally that means they’re getting much more better much better products out the gate.
Building Before and After AI
Vibhu [01:01:32]: Does it mean your teams are smaller?
Akshay Nathan [01:01:33]: I think there’s much more to do now. So I think people can accomplish more individually or in a small team than they were that would require more people than before. But there’s, at the same time, there’s also more to do, so I think the teams are much more ambitious.
Vibhu [01:01:50]: Have you seen any changes in scopes of roles and building teams and how we used to have teams, say, a few years ago versus what ideal teams look like now?
Akshay Nathan [01:01:58]: I think we’ve seen a blurring in the lines between, like, the typical product development functions, like between, like, EM/PM, engineer, designer, et cetera. Like
Vibhu [01:02:08]: Yeah, I wanna bring up this quote. There will be, only four jobs left in tech. There’s AI slop cannon, the people who just, like, they’ll burn a bunch of tokens. And then there is SRE, the people who. people who are more responsible. There’s grown-ups who sell things, and then there’s hot people.
Akshay Nathan [01:02:27]: This is an interesting take. I think my suspicion is that there’s everything everyone will be, like, shaped in a way, in that, like, AI will enable everyone to become a generalist. Like, things that, like, I never would be able to, like, come up with a design before and, like, even now, like, I don’t have maybe, like, the visual taste required, but I can iterate on something with the help of AI. But then people will have a specialty, and that’s, like, the straight line in the T or the upward line in the T. And so, like, you can have a specialty that you’re interested in. With the help of AI, you can go deeper and become better at over time, but then you’ll also be a generalist. And so with that foundation, the way you can accomplish is, like, almost limitless.
Team Shape, Shaped Builders, and Taste
Vibhu [01:03:07]: What are you bottlenecked by in terms of specialties? Like, do you need more designers? Do you need more slop cannons? Do you need more hot people?
Akshay Nathan [01:03:15]: I think the bottleneck some becomes, like, ideas and taste. I think because anyone can build now, I think, it really is the era of, like, bottoms-up ambition. And because there’s so much to be built, like, you’re always gonna be bottlenecked by, the amount of ideas and amount of things that you’re doing at any given time.
Vibhu [01:03:37]: Do you think models help solve that?
Akshay Nathan [01:03:39]: Models?
Vibhu [01:03:40]: Yeah. I have the example of, like, I have a front-end design skill that’s like, they give me four drastically different examples of what this looks like. Sure, it burns a lot of tokens, but. And then I’ll mostly just condense down, “Okay, I like this part. I like this part. Let’s draw these together.” And it’s like, yeah, I had a vision, but, like, I don’t know.
Akshay Nathan [01:04:01]: I would say that the one automation that I would love to work and it doesn’t work is bring me new ideas, right? somehow LLMs are just not it. One interesting part about ideas is, like, they’re not, like, in a vacuum. It’s, like, not. They usually come from somewhere and, like, in product development, like, they’re coming from talking to users or reacting to, friction that you’re seeing or feedback, building on some foundation that you already had planned out before, whatever. And so I think that’s where, like, I think there will always be value in these, like, generalists that we talked about, like, closing that loop and then having coming up with those ideas that are grounded in that feedback or talking to users, whatever it is.
Defining and Measuring Productivity
Vibhu [01:04:41]: Cool. You were gonna. You lead the productivity team. How do you define productivity?
Akshay Nathan [01:04:46]: I think our mission is to make it possible for people to do things that they weren’t able to do before. And right now we’re thinking about it from the perspective of knowledge work. And so when I look at knowledge work, I think about people are no longer siloed by their roles. They’re no longer siloed by maybe the, background or training that they have. Like, no matter what function you’re in, you can suddenly build things. You can suddenly get access to data that you otherwise might not be able to interpret, et cetera. And then I think that extends to your personal life, where we want to give you leverage at the end of the day. Like, we want the models and the product to be able to give you leverage so that you can, create time for yourself to do the things that you love.
Vibhu [01:05:25]: Does that also translate to a way to measure productivity? Like, what is new?
Akshay Nathan [01:05:29]: The end is
Vibhu [01:05:30]: How do you measure leverage?
Akshay Nathan [01:05:31]: I think we haven’t figured this out yet. Part of the reason is it’s so diverse. Everyone has different goals, and really the true measurement is, like, their ability to achieve that goal. Did we help you or did we not?
Akshay Nathan [01:05:44]: And it’s very difficult without knowing what that goal is up front and also tailoring it for every individual.
Vibhu [01:05:48]: And the thumbs up and thumbs down from ChatGPT doesn’t give you anything, right?
Akshay Nathan [01:05:52]: You don’t know if they’re thumbs downing the content of the answer, the vibe of it
Vibhu [01:05:56]: Oh, yeah
Akshay Nathan [01:05:56]: Whether or not it helped them with their goal. I think that’s difficult. But it’s something that I think we will need to figure out and the industry at large will need to figure out because, that’s how we measure success, if this is what we’re, we’re
Vibhu [01:06:06]: Do you think it’s changed, productivity and how you measure it? you said there’s a lot more work that can be done, a lot more scope. has it changed?
Akshay Nathan [01:06:15]: I think it was always true that what you really wanted to measure is, like, was your team, was the individual, were you personally able to hit the goal, or are you closer to hitting that, whatever your goal is, right? But I think previously we used proxies for this. So, like, code commits or
Vibhu [01:06:31]: Lines of code
Akshay Nathan [01:06:31]: Lines of code or whatever.
Vibhu [01:06:33]: Story points.
Akshay Nathan [01:06:34]: Yeah, exactly. Story points. And, like
Vibhu [01:06:36]: They’re coming back, by the way.
Akshay Nathan [01:06:38]: maybe. But that is for a part of the change. And, like, I think with AI now, those proxies starting to fall apart. Like, you, the number of tokens you use or the number of pull requests you make are, like, no longer, like, maybe as hypercorrelated with that, is your team able to hit the goal or are they on track to hit their goals? So I think we’ll need to come up with new, measurements.
Vibhu [01:07:02]: For the managers listening, give them one thing to try.
At-Bats, Motion vs. Progress, and Closing
Akshay Nathan [01:07:06]: I think for me, what’s important is like at-bats. Are we as a team building the muscle to have not just quantity of at-bats, but quality? Like, are we able to go all the way from, like, generating an idea, building it out, getting the feedback, reacting to that feedback, validating or invalidating the hypothesis, going on to the next idea? Are we able to do that really efficiently? And like, that goes to like, the actual like code that’s being written or the designs that are being made or the specs that are being written, whatever, but also the culture of the team. Like, do we have the humility and, are able to like go through that process many times and stay motivated and excited throughout that? so that’s the thing that like I think is important now, especially when we’re on the frontier of this technology and like there’s so much to build, there’s so much to do. That’s probably the most important thing that we look at.
Vibhu [01:07:54]: Any traps people fall into around measuring productivity with your teamwork on. I feel like there’s a lot of, okay, we added a lot of LMs. We have dashboards for this and that, but not much has changed, right?
Akshay Nathan [01:08:06]: That is the trap, yes.
Vibhu [01:08:09]: And the broader source of the question is for the managers and teams building, how should they approach this?
Akshay Nathan [01:08:18]: I think maybe the trap is like conflating motion and progress. I think motion is much easier now than ever before because of the tooling that we have. But progress requires you to be like very prescriptive and deliberate about like what you’re trying to achieve, and it goes back to our question of measurement, right? Like you wrote we were talking about like, can we, OpenAI, like figure out how to measure productivity for our users? That’s, that’s a very hard problem because of the diversity. But like as a team, like you should have a really prescriptive and deliberate view on like what progress looks like for you and for your team. And if you don’t have that, then it’s very easy to conflate these two things.
Vibhu [01:08:57]: I think at-bats is a really great thing. I’m, I’m really glad. I like the discussion between motion and progress. I think that’s a quote that we’re gonna feature on the write-up. You’ve been very generous with your time. Thank you so much and congrats on ten million.
Akshay Nathan [01:09:08]: Yeah, thank you for having me.
Vibhu [01:09:09]: The next one at a hundred in two months. Two weeks. Thank you.

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Inside the Model Factory — Eiso Kant, Poolside AI
07/23/2026 | 1h 54 mins.
In recent months, the open vs closed, and US vs China discussions on model ownership and sovereign/local AI have heated up to a fever pitch. So it is very very good news that Poolside AI are finally emerging with new models, like Laguna S 2.1, that are beating Thinking Machines’ recent release nearly 10 times their size.
Poolside’s recent tech report got a lot of praise due to their level of detail, and Vibhu first covered Laguna’s recent technical report on our paper club:

From spending $12 million building language models for code before the world cared to creating a Model Factory that can take a model from pre-training to release in eight weeks, Eiso Kant has spent more than a decade betting that code is the path to AGI. In this episode, the Poolside co-founder joins swyx and Vibhu to explain why ChatGPT felt like vindication, why Poolside embraced open weights and open research, and why he would rather live in a world with 100 foundation model companies than five even if Poolside were one of the five.
We go deep on Poolside’s Model Factory: the engineering systems behind 10,000–20,000 experiments per month, streaming data directly into training, reproducible experimentation, low-precision compute, and agents that increasingly write code, launch jobs, evaluate results, and modify the pipelines used to train future models. Eiso also unpacks their recent launch Laguna S, why persistence, verification, and backtracking may matter more than raw intelligence, how much capability remains inside smaller models, why reinforcement learning will move earlier into pre-training, and why next-token prediction is still extracting too little from the web.
We also discuss model-harness co-design, Poolside’s path from coding agents to AGI, why Eiso thinks MCP and traditional tool calls are “stupid,” the real economics behind frontier-model training, Poolside’s $500 million raise, open-source AI, regulation, NVIDIA and TSMC’s influence, engineering productivity in the agent era, high-agency teams, and hiring at Poolside.
We discuss:
* How Andrej Karpathy’s RNN work inspired Eiso to start building language models for code in 2015
* Why Eiso spent four years and $12 million pursuing an idea before the market cared
* Why ChatGPT felt like vindication and brought Poolside back to open source
* Why Eiso would prefer 100 foundation model companies over an oligopoly of five
* The difference between releasing open weights and publishing genuinely open research
* Why Poolside deliberately built a global research organization outside the Bay Area talent war
* Why model building is ultimately 90% engineering
* The Model Factory: Poolside’s end-to-end system for rapidly training and improving models
* How fewer than 70 researchers run roughly 10,000–20,000 experiments each month
* How Poolside moved from six-month model cycles to five- and eight-week launches
* Why streaming data directly into training unlocked faster experimentation
* How immutable data, versioned code, and reproducibility enable rigorous model research
* Why Eiso wants capable researchers to leave their labs and become Poolside’s competitors
* Why 95% of model building can be reduced to better data or compute efficiency
* Laguna S and why persistence, verification, and backtracking can outperform raw intelligence
* Why smaller models may handle far more knowledge work than previously expected
* Why reinforcement learning will move earlier into pre-training
* Why next-token prediction is still failing to extract enough knowledge from the web
* Why distillation and environments have become the AI industry’s favorite “drugs”
* Why mid-training is really an early form of curriculum design
* Low-precision training, networking bottlenecks, and the next gains in compute efficiency
* Laguna S: 118 billion total parameters, 8 billion active, and eight weeks from training to launch
* Why model builders can often evaluate a new checkpoint within its first 30 minutes
* Model versus harness: where agent capabilities actually come from
* Why Poolside sees coding and long-horizon software tasks as a path to AGI
* Why Eiso thinks MCP and traditional tool calls are “stupid”
* Why future agents will write scripts instead of choosing from dozens of predefined tools
* The case for minimal harnesses, containers, and model freedom
* Why Poolside is prioritizing vision but does not expect to work on audio soon
* Why language may be the most compute-efficient modality for encoding knowledge and reasoning
* The real cost of model development and why the final training run is anticlimactic
* The story behind the Poolside name and why it represents refusing to lower ambitions
* How Poolside raised $500 million while investors still questioned whether AGI was real
* Why intelligence could become the world’s most demanded and commoditized resource
* When open models may become too capable to release without restrictions
* Why unilateral AI safety does not work in a globally competitive environment
* How regulation could accidentally lock in an oligopoly of two or three AI companies
* NVIDIA, TSMC, and the hardware systems underpinning foundation-model progress
* Why reinforcement-learning wall-clock time is one of Poolside’s biggest bottlenecks
* Why Poolside trains models from scratch instead of simply distilling larger models
* How AI changes the way companies should measure engineering productivity
* Why agency may become the most important quality for employees in the AI era
* How leaders align high-agency people through shared goals and clear constraints
* Hiring across research, post-training, pre-training, architecture, evals, and engineering at Poolside
Eiso Kant
LinkedIn: https://www.linkedin.com/in/eisokant
X: https://x.com/eisokant
Poolside: https://poolside.ai
Timestamps
00:00:00 Introduction
00:00:54 Karpathy, RNNs, and Building Code Models Before Transformers
00:02:26 The $12M Failure and ChatGPT Vindication
00:03:39 Open Source and the Case for 100 Foundation Model Companies
00:09:22 Open Weights, Open Research, and Poolside’s Global Team
00:16:04 The Model Factory: Why Model Building Is 90% Engineering
00:20:19 Agents, Automated Experiments, and Early Signs of RSI
00:24:04 Streaming Data, Reproducibility, and Scientific Rigor
00:30:35 Creating More Foundation Model Companies
00:36:07 Laguna S: Persistence vs. Raw Intelligence
00:43:01 Reinventing Pre-Training, RL, and Curriculum Design
00:52:33 Low-Precision Training and Squeezing More From Smaller Models
00:58:37 Model Harnesses, Coding Agents, and the Path to AGI
01:09:26 Why MCP and Traditional Tool Calls Are “Stupid”
01:13:04 Vision, Multimodality, and Why Language Still Matters
01:18:15 Scaling Models and the Real Economics of Training
01:20:40 Why Poolside Is Called Poolside and Raising $500M
01:27:37 Open Models, AI Safety, and the Risk of an Oligopoly
01:33:53 NVIDIA, TSMC, and the Reinforcement-Learning Bottleneck
01:41:52 Smaller Models, Distillation, Engineering Productivity, and Hiring
Transcript
Introduction: Eiso Kant, Poolside, and Open Models
Swyx [00:00:00]: All right, we’re here in the studio with Eiso Kant from Poolside, together with Vibhu. Welcome.
Eiso Kant [00:00:08]: Thanks. Thanks for having me, guys. Good to be here.
Swyx [00:00:10]: Yeah, fresh on the plane. You texted me, you were like, “Hey, I’m on my way to SF.” I was like, “You’re on a plane right now, right?” Like, hey.
Eiso Kant [00:00:16]: I know. After I texted you, I realized that probably coming in with major jet lag was gonna offer some fun experiences today, but let’s do it.
Swyx [00:00:23]: I mean, I think the thing I would tell guests is that they don’t have to prepare that much because if you’re truly working on this every single day, then even, like, what you hazily remember is going to be new for a lot of the audience that don’t live in your world every day, right? so 10 years ago, you did a talk at Google Slush, talking about the democratization of AI. and, now here you are, like, open sourcing an incredible new model that we’re gonna talk about. But I guess, like, what got you into democratization of AI? Like, it’s not obvious from your LinkedIn or something.
From Karpathy’s RNN Post to Sourced
Eiso Kant [00:00:57]: No, it’s not at all. I don’t think it’s obvious how I got in this space. I owe getting into this space to Andrej Karpathy.
Eiso Kant [00:01:05]: In 2015, he wrote an article called “The Unreasonable Effectiveness of Recurrent Neural Nets.”
Swyx [00:01:10]: Neural Nets, yep.
Eiso Kant [00:01:11]: And that article, I read it, and I pivoted my startup at the time overnight to working on RNNs, and later LSTMs and Transformer models to be able to write code. If you go to this article and you scroll down, you can start seeing, like, this was the precursor to what ended up becoming language models. So, at least when he was character-level language models that were starting to predict letters, he has an example out here. There’s a little Paul Graham generator, and you can read it, and the text makes sense, but it doesn’t. and there’s a little-- There’s an example of code a little bit further down. Yeah, so Shakespeare.
Swyx [00:01:47]: Shakespeare.
Swyx [00:01:49]: Cool
Eiso Kant [00:01:49]: And for some reason, I read this, and I went down the rabbit hole of learning everything I could about RNNs and LSTMs, right? This is Transformer paper. And I had built a completely unreasonable belief, that neural nets should be able to generalize to anything and everything, and that language should be able to generalize, to a lot of things that are intelligent and the ability to write code. And so I started building Sourced, which was a fully open source company trying to build, what we used to call machine learning on code, language models on code. And we spent about four or five years on this, till the end of 2019. And that sounds really cool today, but back then, no one cared.
Eiso Kant [00:02:29]: Right? Like, no one cared. We were in the dark. Like, we did things along the way. We tried applying convolutional neural nets to, like, the structure of code. We were. when attention came out, we were applying it to LSTMs, and then the Transformer paper came out. And it - it wasn’t obvious, and what we missed throughout that entire journey, that we were on the right track, but we should have just kept scaling up. And today, to all of us, the scaling laws and scaling up seems like the most obvious thing. But having spent four or five years of my life on working on language models on code, it wasn’t obvious. So I have a lot of respect to folks at Google and OpenAI and others who took that confidence and kept going. we failed ultimately at the time, and it was, like, biggest failure of my career, right? You blew $12 million of investors’ money, which was a lot back then.
Swyx [00:03:18]: Yep.
Eiso Kant [00:03:19]: You spent, still a lot, but, And you spent years with, like, a group of 40 people just obsessing over this problem. And life took a different turn, And it was, and family became a focus, and I kept my heads down and really, didn’t really look at language models for the following two years. big mistake considering Following years are gonna be really interesting. And then ChatGPT came out And it was like a vindication. It’s like people started texting me. I found, like, my old, work decks and these old talks. And throughout that whole journey, we,
ChatGPT, Vindication, and Returning to Open Source
Eiso Kant [00:03:56]: We really had a strong point of view at the time that, like, as you’re building more capable intelligence, it should be open and open source.
Eiso Kant [00:04:04]: When we started Poolside, that wasn’t the case at all, and I wanna be very open about it. When we started Poolside, we were like, there was a premise of two things. One is this technology is not gonna stop compounding in capabilities. I think to most people obvious today, but three-plus years ago when we started, most people were still arguing if these were stochastic parrots or not.
Eiso Kant [00:04:23]: And the second was that reinforcement learning was gonna be the biggest driver for LLM capabilities. Today, very obvious. Three years ago, was not an opinion held or direction held at either OpenAI or Google or Anthropic or others. And so people looked down on us a little bit. They were like, “ is this really gonna work?” And so we just started working the problem, and we never really thought about open source again. We just kept our heads down and we built our, like, knowledge, understanding from scratch, right? We didn’t roll out of an existing lab. So we picked up the papers and started writing code and figuring things out.
Eiso Kant [00:04:59]: And it wasn’t until the beginning of this year that me and my founder, Jason, picked up the open source conversation again.
Eiso Kant [00:05:07]: And if you go back to some of the early things on our website, it was very straightforward. It was we wanna get to AGI, we wanna support a world of abundance, and we wanna be the first company that gets there.
Eiso Kant [00:05:20]: But we started talking at the beginning of this year because it became obvious that the world was going in a direction that was starting to like, pick at us a little bit. Like, it didn’t, this didn’t happen overnight. It was, like, a little bit we were seeing this and we’re like, “Okay, The world’s going down a path.” And Throughout this journey, there was something that I used as a, as an analogy or thing. So I said well, if I go back to back in those days, 2015 or 2016, we’re working on this, and I picked up a fi book off the shelf, and I was reading the book about 2035. AGI is achieved, and the story would be over the following, decades. And it would have that first chapter where everyone’s trying to figure things out. You’d get the chapter of ChatGPT coming out And then you would get to the chapter where the world was at a fork in the road, and the one that it picked was one where three or four or a handful of companies were going to create all of intelligence moving forward.
Eiso Kant [00:06:21]: And when I thought about that story, it felt like a dystopian fi book, not a utopian fi book. And the reality is, I’m a utopian fi guy. Like, and so We took a step back and said, “Hey, can we play a role here?” Now it was easy for us to do so because we were not at the frontier.
Eiso Kant [00:06:41]: If we were at the frontier, I don’t think we could have changed our mind. and I don’t mean this like it’s when the moment there’s too much capital involved, too much expectations, you’ve built up things, right? We’re a small team, just improving and improving. And so we knew that we could make that decision now, but it would be a lot harder to make as we got closer and closer to the frontier and caught up to others. And did a lot of soul-searching and a lot of conversations, and said, “No, this makes sense,” Even if there’s big unanswered questions, like how the hell do you build a business model with foundation models about open source? Big open-ended question that we do not fully have the answer to yet, right? At what point do you no longer wanna release open source models because misuse of models has, real potential risks associated with it? how is the government gonna respond to open source? but I think it all just came down to one thing, and I’ll stop the monologue, is the fact that I rather live in a world that has 100 foundation model companies than a world that has five, even if I was one of the five. And the smallest and most meaningful contribution we can make for 100 to exist is to open up our research and open up, like, our weights right now and figure out along the way how we can, like, do more.
Neo-Labs, Model Choice, and the Token Economy
Swyx [00:08:01]: Yeah. I think if anything, over the past three years, that has become a bit more true. you are one of a cohort of Neo labs
Eiso Kant [00:08:10]: Yeah
Swyx [00:08:10]: That people are now calling that. And, we’re, we’re doing this on the day that Thinky launched their, new model and you are outperforming them on their, on some benchmarks that they released, right? Like, they just don’t have it yet. so it goes to show that I think, like, this is one of those things where, like, there is room for multiple players, and you are seeing a little bit more of the future. Maybe more like 20, not 100, but, like, you are one of the 20.
Eiso Kant [00:08:36]: I really hope so, right? I think we I’m, I’m excited about their release, and I’m excited about everyone releasing because, like, ultimately, like, choice competition is both gonna drive progress in the right direction. But the fact that like, we create models and while we all, drink out of the same well of data effectively, we do introduce very different behaviors and biases in our models. Some are intended biases, some are completely unintended biases.
Swyx [00:09:03]: Yeah.
Eiso Kant [00:09:03]: And if we shape up in an ecosystem in the world where open models are gonna be a part of the token economy, like, I don’t think there’s any question about it anymore Then we want to be able to live in a world where companies, countries, people can choose and say, “Hey, I am most aligned and I trust most this provider for these things.”
Swyx [00:09:25]: Yeah.
Vibhu [00:09:26]: I think more than just one of the 20 Neo labs, up until recently, most of open source innovation was coming from the Chinese labs, right? So there’s the DeepSeek of the West. Is it today? Okay, maybe it’s thinking machines reflection, but there aren’t many, right? So, one of the things you guys started in France, Europe, but very much now you’re taking that American standpoint and more than just that, the point is the Chinese models that we see, they’re not super open research. the work you put out is, I think, some of the best. So every few months you get not only frontier models, but also here’s a breakdown blog, paper, technical report of here’s everything for state of the art to build, frontier intelligence and you’re filling that gap too, right? So not just only open weight, not just Western, but also pretty open research.
Open Weights vs. Open Research
Eiso Kant [00:10:20]: No, I appreciate it. Look, I think it’s, I think it’s the most meaningful contribution, right? Weights are a binary. Let’s call them what they are. Yes, we can modify them, we can change them, but, like, giving someone the weights does not allow them ultimately to recreate what you’re doing, right? And so now there’s challenges around releasing data sets, challenges around like releasing certain things, but being able to share your research, like, right, how do we do it? What are the lessons we learned that we spent, tens of thousands of experiments of compute on? I think very much so. One correction though, Vibhu, and I say this because it’s been haunting us for quite a few years. We from day zero were an American company.
Swyx [00:10:55]: Yeah. They moved
Poolside’s Global Team and American Company Story
Swyx [00:10:56]: To France.
Eiso Kant [00:10:56]: So the story once and for all is very. We start as an American company. We have always been an American company, and early on we made a very conscious decision. We said, “We’re not gonna hire any researchers in the Bay Area. We’re gonna look for talent everywhere else in the world.” and that is everything from Middle Americas, Seattle to, Serbia, and to Taiwan and Singapore and other places. And it was because we took a view that this was gonna become a talent war for this, and I think it has over the years now. Three years ago, that wasn’t fully obvious yet. I think today it very much is. And we also realized that, like, some of the world’s most capable people with, like, the most interesting, innovative ideas were not just gonna be here. And so it led us to create like a fully remote company. and we ended up opening an office in Paris and London and different places and we have a lot of the team in the US and a lot of team outside. But we always took this view of like, we’re an American company, but if we want the best of the best to work with us, we need to take a global view. Now we do also have people here in Silicon Valley, like the company’s grown and others, but I think one of the things that, it slowed us down at the beginning, but it has sped us up now, and it’s why you’re seeing like the progress, I think, on our models and the cadence at which we release, is because we didn’t roll out of an existing lab. Right? we didn’t, we didn’t have a lot of the information that’s freely flowing around here at the time. We just took this point of view as like, “Okay, well, let’s just work the problem. Let’s just go and, like, read the few papers that are out there, and let’s just figure this stuff out.” And we made some hilarious mistakes in model training because of that over the years
Eiso Kant [00:12:35]: Like especially in the first 12 months. there’s a few that I think still haunt me and scare me. We can talk about them later. but it created a, like, a resiliency and persistency in the team, right? with extremely few people have left us over the years, that, like, told us, “Okay, we can do this.” When we first wrote our first training code base completely from scratch, it wasn’t a fork of any open source. It was just like, “Okay, let’s build it from scratch.” I remember we had this one moment where we spent three weeks working out an optimizer bug. Like, it was like training just couldn’t get stable. We, like, obsessed over it, and we thought, like, maybe we were wrong. Maybe we should have just forked this repo, or we should have. But then when we solved it, I still remember at the time we were like five people in the company. when we solved it, we were like, “Oh, we can do things,” like if we’re just willing to work hard. and I think that culture with a very strong engineering bias has helped us, like, get to where we were. And so there’s this notion of open source and talent and these things. I think we, We just took different decisions from a different starting point. and I think we are lucky. I do want to definitely call it lucky. And there was a lot of hard work at the team that now, like, that’s starting to show up in results.
Swyx [00:13:52]: Just ‘cause we probably won’t revisit this again, but, and this is a fun recruiting challenge if someone knows the answer. What was the bug? And then we won’t tell the solution, but we’
An Optimizer Bug and the Value of Building From Scratch
Eiso Kant [00:14:01]: So the - This - You’re gonna test my memory here,
Swyx [00:14:04]: Oh, okay
Eiso Kant [00:14:04]: So but I think
Swyx [00:14:05]: Directly
Eiso Kant [00:14:05]: I think I can recall. So if you, so if you look at, So if you take like Adam as an optimizer, you have epsilon
Swyx [00:14:12]: Yeah
Eiso Kant [00:14:13]: Which is, right, like in the denominator
Swyx [00:14:14]: Momentum and weights. Yeah
Eiso Kant [00:14:15]: Is exactly, in the denominator. And at the time, if I recall, you looked at like the early Llama papers and things like that. People were juicing epsilon, like, quite a bit. Like, they were, like, adding, I don’t know if it was E minus four or whatever, like a high value for epsilon.
Eiso Kant [00:14:31]: And if you think about this during training, it’s like a bit weird and counterintuitive that we’re adding noise to our optimizer by just adding effectively, like, a random number in the denominator, right? Like behind the decimal point. And I don’t recall the exact bug, but it had - What I remember is once we solved it, we no longer had to juice epsilon as much as, like, was happening in the Llama paper and other places. and it was like one of those fundamental moments where we had trusted this paper that was out there, and we’re like, “Oh, no, it has to be this way. It has to have this high value of epsilon.” But it made no sense to us intuitively. Like, why do you have to have this so high? Like, if you’re just trying to avoid division by zero, why can’t the value be extremely small? and that was like one of those moments where you realize like, okay, finding things out from scratch yourself builds a better intuition. Because the one thing you learn very quickly with model building is that your intuitions that you start with are gonna get beaten up so hard.
Eiso Kant [00:15:33]: Right? Like - It’s such an experimental science, that the things that seem obvious, you very quickly get to learn, like, you were wrong, and hopefully you figure out why, and sometimes you don’t even.
Swyx [00:15:45]: Yeah. yeah, so, one of the reasons that you, when you released your new models, Vibhu got really excited. I mean, everyone got really excited. But Vibhu led our paper club on it, and you guys saw
Eiso Kant [00:15:58]: Yeah
Swyx [00:15:58]: Obviously. maybe talk through some lessons learned in that, whatever you can disclose. we can focus on the model factory stuff, whatever you think is a good starting point.
Model Building as Engineering
Eiso Kant [00:16:08]: So I would say that our view from very early on in the company was that model building is ultimately 90% engineering.
Eiso Kant [00:16:18]: And I think we all know it in the industry because if you look at where’s every researcher spending their time, they’re spending their time writing code, right? Looking at data and writing code. And so we said, okay, The state at the moment, like three years ago, was bash scripts and Slurm and spaghetti code bases for training and, like, data pipelines that were patched together. And we looked at this and said, “Well, ultimately, model building is a process.” You’re going from raw data, right? Like training raw material, the web, et cetera. you’re doing a whole bunch of filtering, cleaning up, transformations, analyzing. These days, that’s, far more complex than it was three years ago. then you’re training a model, which is effectively a large distributed systems problem, right? Across hardware that has still-- It’s become a lot more reliable. It was extremely flaky back then. and now with every new generation, we get our new sets of challenges. And then you go into the next stages, right? There was no training back then, but, like, you got, your post-training and then your reinforcement learning. And so we looked at this and we said, “Well, this looks like an industrialized process. This looks like an end process, that every single part of it has its machinery,” right? If it’s your big data pipelines, if it’s your crawling ingestion of the web, if it’s your, large-scale distributed training, and then you’ve got your reliability. And we said, “Well, why don’t we take some of the world’s smartest distributed systems engineers that we knew and make them part of the process of research from day zero?” Not retrofitting it later on, but, like, really from the beginning. And that became our model factory. And so our model factory started with a handful of components. Today, it’s thousands of components, and I try to equate it to, if you think about, like, someone who was at the very early days of Foxconn, if they had been there for the following, decade, they would be able to rebuild Foxconn because they saw every decision that led to building that system and all the complexity. If you and I walk into Foxconn today, no chance.
The Model Factory and Experiment Velocity
Eiso Kant [00:18:18]: Right? Because we don’t have the lineage and history of decisions that led to that. And so we built early on from the beginning- with a team that really understood that, well, the metric that we are optimizing for is the speed of an idea from a researcher to an experimental result that we can trust to then being part of the next model training.
Eiso Kant [00:18:42]: And in the. And because it’s such an experimental science, ultimately, in the beginning when it wasn’t that complex, you could patch your way around it, right? But now, at any foundation model company, you are running. I mean, we’re a small team, right? We’re less than 70 researchers, another 35 engineers. and we are running, I haven’t checked the latest count, but far more than 10,000, maybe 10 to 20,000 experiments a month that we cut. And so if you look at that scale of every model run that is, like it’s ultimately it’s, it’s you need to be able to trust it as an infra problem. And so what we have now done over the years is gotten really good at that, and just by working it and improving it and obsessing over those end decisions. So now what that means is that you looked up Laguna XS 2 that we launched. It was five weeks from the beginning of training to launch. The model that we’re gonna talk about today was eight weeks from start of training, to launch. We started the next model literally yesterday because we now finished the post-training required for the model we’re launching, next week or by the time this comes out today. and we move that compute to the much larger Laguna M model that we’re now training. And so the model should be an artifact of someone’s process. It shouldn’t be really a thing in itself. Like, and we treat this like the way you would look at like a SpaceX factory where, yes, the first rocket, really hard to build, but the much harder challenge was building the factory. And now they’re rolling off, and no one is really thinking about the next launch anymore. So it’s just another launch, it’s another launch, another rocket comes off. And that’s what we’re trying to do with model building.
Eiso Kant [00:20:22]: And what has been, which was not planned from day zero, it was in the back of our mind like this will happen one day, is that when you build a really good end model factory with really good APIs and really good engineering systems, Well, what is it perfect for? It’s perfect for agents.
Agents Inside the Model Factory
Eiso Kant [00:20:40]: Because agents are now starting to take over more and more work in our model factory.
Vibhu [00:20:43]: Yeah.
Eiso Kant [00:20:44]: So I look at the screens when I walk, like when we’re, we come together, in our monthly, we do monthly onsites, and I walk behind people’s screens and I stop by and I talk to our researchers. And the default is all of these different agents running on their screen that are writing the code. They’re launching the jobs. They’re evaluating the results that are coming back from the model runs. They are, making the changes. And we’re still in the driver’s seat. We’re still coming up with the ideas. We’re still helping with the debugging. But more and more, and this is right now very profound on the data side of our pipelines in both pre and post and the synthetic data pipelines, it’s starting to become more on the architecture side as well. You’re starting to see these twinklings of what RSI is gonna look like.
Eiso Kant [00:21:27]: And that’s. So when we talk about, like to your question about our models, every talk about the model factory, And my coolest example of these things is always that when we kick off a new run, doesn’t matter if it’s a training like big run or if it’s now a post, like one of 10 post-training versions we do for like release or many experiments, is that at any given moment, the changes that somebody made that they had experimental results from the day before make it into that run.
Eiso Kant [00:21:57]: So there’s not like a cutoff 90 days before. Like no, it’s like literally from that moment because we can now trust the machine enough. And then you also have to invest in the reliability. So one of my favorite metrics about like Laguna S is that there was no call events, Right? Like completely zero. And we haven’t had a meaningful call event, like something to wake up for, as far as I recall this entire year. now there is one asterisk to that. In usually the first six hours of launching a new model run, something breaks because you set a config wrong, you made a small mistake, et cetera. So that’s usually there’s a little bit of intervention, but that’s always within like call periods, right? Not on call. And I think that’s starting to now compound. So the model we’re releasing now, I love it. It’s amazing, but we’re already onto the next one. and I think that’s the way it should be.
Laguna, Five-Week Builds, and Zero On-Call Events
Vibhu [00:22:50]: Hey, I also just wanna point out, so for context, this was like a month ago. we found it in the tech report, so we just came in with, “Okay, new model’s dropped. Haven’t heard about it.” We were
Eiso Kant [00:23:02]: Yeah, we’re very used to doing this every few months.
Vibhu [00:23:03]: We’re, we’re very much like, “ okay, look, it’s like, on par with Kimi, DeepSeek, whatnot, the small ones, Gemma level. Oh, it’s a very cool paper on what goes into building.” And then we hit this page, right? Like literally page two of tech report is, “This process allowed us to build the small model from scratch to delivery within five weeks applying the lessons”. And then I’m like, oh, this paper is not about here’s a tech report of benchmarks and here’s how many tokens it was trained on. Like for people that wanna dive more from what we’re not gonna discuss on the podcast, it’s all laid out here, right? From
Eiso Kant [00:23:38]: Yeah
Vibhu [00:23:39]: Custom software that agents can use to interface with training code, training data.
Eiso Kant [00:23:45]: Yeah. Well, link the paper correctly, so yeah.
Vibhu [00:23:47]: Yeah. All that stuff. read the paper here, but,
Technical Report Principles and Streaming Training Data
Eiso Kant [00:23:50]: But I would like to. I love principles, and I think that is a good starting off point for maybe telling some stories. Maybe we can go one by one past the principles. I’ll just call out that Dagster just got bought by a Prefect.
Vibhu [00:24:01]: Yeah.
Eiso Kant [00:24:01]: Isn’t it fun? But yes, I’m very familiar with Dagster. just anything where like they trigger some story.
Vibhu [00:24:07]: So, well, I would say, well, experiments code’s obvious, but I think one of my favorite things is, I don’t know where it is in here, but early on, and I still think this is the case a lot of foundation model companies, people prepare their training data sets, they get packaged up, then they get copied over to a training cluster distributed across all of the nodes, and then training starts.
Vibhu [00:24:30]: And we looked at this like three years ago and we were like That makes no sense
Eiso Kant [00:24:36]: You lose so much time because the moment you have to rematerialize the data set, you have to make a change, you have to fix something, et cetera, you’ve got all this time of like repackaging it, right? Toca- tokenizing it, repacking it, moving it over to a cluster, then distributing it across the nodes. The bigger your clusters are, you start using fancy like torrent-like algorithms to like distribute your data. So why aren’t we streaming data into training? Right? Something that’s very common and like just basic
Vibhu [00:25:00]: Like just in time
Eiso Kant [00:25:01]: Just in time, like good computer science like principle. And that was one of the first things that I think unlocked - the model factory. Because the moment you start thinking about, well, a training job, it doesn’t matter if it’s a big hero run or a small like, post-training experiment, consumes a certain number of tokens per second, right? And it’s not a lot, right? From a like a data, moving data perspective. So we said, well, we have our training cluster, and then we’ve got like our AWS kinda setup where we can build these amazing big data pipelines. We can set things up. We use Spark underneath the hood, like all these things.
Vibhu [00:25:36]: But when you say AWS, it’s not actual AWS, it’s your internal AWS.
Eiso Kant [00:25:39]: It’s our internal-- No, it’s our internal like just running like our infrastructure
Vibhu [00:25:42]: Site web services
Eiso Kant [00:25:43]: Exactly. Our stuff running on like an AWS account or on like any hardware, right?
Vibhu [00:25:47]: Yeah.
Eiso Kant [00:25:48]: And so once we made that shift into I can stream data into training, all of a sudden you realize a lot of things unlock. Because now you don’t have to wait for the whole data set to materialize.
Immutable Data, Experiments as Code, and Scientific Rigor
Eiso Kant [00:26:00]: You now all of a sudden when you’re running data experiments about mixing data, it’s a config. Because you’ve got these data sources that are coming in, and you just - we have this service called Blender that’s in the report, where we then say, “Okay, for this run, I want 20% of this source, 10% of this source. I want this much, so many epochs of repetition. I want this to be, shuffled in a certain way,” and your training job can start while the rest of the data is even still materializing. also what it does is because all of this underneath-- So for us, we treated the data layer underneath as like an immutable data layer, and that was really important. Like experiments as code, immutable data layer means that you can always go back and understand literally down to the single token at which cursor it went in on which version of the code.
Vibhu [00:26:47]: Yeah.
Eiso Kant [00:26:48]: And it took us a I have to admit, like the first year of Poolside, we understood that engineering had to get great, But we didn’t understand yet, that this is ultimately in support of like a good rigorous scientific progress. We were quite a - We were a very small number of people, so a lot of it was YOLO ideas and YOLO runs.
Vibhu [00:27:08]: Yeah.
Eiso Kant [00:27:09]: And we built great infra for the YOLO runs. But once we realized that we treated data as immutable and code as always versioned, and you could always track and trace every experiment end to end perfectly, you could repeat everything perfectly, right? You have perfect reproducibility. I can still reproduce runs from two years ago if I wanted to, right? It enables the scientific progress, like the scientific process, and I think that took us probably about a year and a half into the company to figure out. We also had some great hires, like our head of applied research, Nikolai, who joined us from Yandex, who’d been working on language models since like the early 2020s, I think brought that into the company of like, “Hey, we wanna have even more rigor.” And then once we kinda had the combination of like increasingly more capable platform that allowed people to do more, but had this immutability, we were able to start “Okay, every experiment is truly an ablation. We truly need to understand it.” And I think we became much more scientifically rigorous in the last couple of years, and the infra underneath enabled it. and then there’s just fun stuff like, and
Vibhu [00:28:16]: Yeah, a lot of it’s fun, like even just the, one, you share all the ablations, two, picking the data sets, right? There’s like a random small paragraph in here where it’s just like, “Oh yeah, training data, we have some, we have an auto mixer.” it trains eight small models, scales them up, picks the training data set. We don’t even need to look at it. I’m like, “Wow, a lot of engineering rigor there.” And there’s just, there’s just a lot in here.
Publishing Research and Giving Back
Eiso Kant [00:28:40]: Yeah, and it’- and look, and we wanna put out more. Like we, We treat writing papers as something that we haven’t earned the right for yet for a long time. So you earn the right to spend time, publishing research once you’re at the frontier, because until then, you’re catching up, and every minute and hour in this industry matters. Like I obsess over it, not just the wall clock time from idea to result, but just general like time every day that we, waste is one that doesn’t allow us to catch up. But in this case, we said, “Okay, we’re gonna give ourselves.” I think we gave the team like three or four days while still doing their work, like give everything in there. And to your point earlier, if your stuff, it’s easy to like put it out. And so there’s so many more things that we wanna talk about over time, and we will definitely start doing. And as we earn more of the right, but also now have like added to our mission that we want more foundation model companies to exist, you’ll see us like be way more proactive, and just trying to keep dropping some of those like things that we’ve learned along the way that can help others like speed up.
Vibhu [00:29:40]: Which is the other cool side of this, right? It’s, it’s not like, back to your point, it’s not just here’s the benchmarks of our training. If you want to replicate, here’s experiments of optimizers, data sets, post-training. you lay out a lot of it here alongside here’s your system for how to do it? So it’s, it’s really like promoting
Eiso Kant [00:29:59]: No, thank you
Vibhu [00:29:59]: Other people can do the same.
Eiso Kant [00:30:00]: And by the way, I also wanna make clear, right, we have been incredible-- Like we’ve taken a lot of advantage of the fact of all the open research that others have published, Right? And you mentioned, the Chinese labs, and we I think it’s important that there’s, from every country and every culture and background, including like Western companies like us, there’s different models that come out that people can choose to trust. But I think we do have to give credit where credit’s due, right? The incredible Chinese lab have done an amazing job at sharing their research, and we have definitely like been on the receiving end of taking advantage of that. So when you’re on the receiving end of something coming to you, I think it’s, you also have an obligation to give back.
Swyx [00:30:39]: Do you have a favorite or underrated Chinese lab that you wanna shout out? Everyone shout outs DeepSeek.
Chinese Labs, Zhipu, and Persistence
Eiso Kant [00:30:44]: That’s a good question.
Swyx [00:30:45]: Moaan obviously for Therapsi. Yeah.
Eiso Kant [00:30:48]: Yeah, look, I think, I think obviously everyone’s been talking about Zhipu lately, with 5.2. I think what most people don’t realize is when they started.
Swyx [00:30:59]: Yeah.
Eiso Kant [00:30:59]: Right? They started years before ChatGPT.
Swyx [00:31:02]: They just rebranded. Yeah
Eiso Kant [00:31:03]: And so, I’ve like, I remember how hard it was to work on these things Before the rest of the world got excited about it. And so I have an immense amount of respect for people, who were working on improving models when it wasn’t the sexy thing to do, when believing in LLMs, was gonna get you ridiculed. I remember like back in 2016 when we were doing what we’d call, machine learning on code with some of these models. we would-- people would just laugh at us, like they’d be like, “This makes no sense. Like why are you wasting all these, like, millions of dollars on trying to figure this out?” And so I would say they’re probably the one that, I think deserves a shout-out, not just because their latest model is very good, but because they fought to get here. And I think, I think every foundation model company it takes time to get here, right? It took us three years to get to the model that we’re, that we’re now gonna be releasing. and now the time in between the models is coming, is counted in weeks. It’s no longer counted in months or years. But this stuff’s hard. and if we can make it a little bit easier for the next person, like we should all do so. Because if we don’t do so, we’re, we’ve got a small window before models are really impacting recursive self-improvement to a level where catching up otherwise might become unfeasible. And we should try to, in that window, encourage as many labs or however we wanna call them, like to start. And so one of my current
Eiso Kant [00:32:36]: Mission, but qualm is like I wanna encourage whoever is a researcher right now who thinks they can tackle this to go and leave and become my competitor.
Eiso Kant [00:32:45]: Like start another foundation model company because I think we need it. I think otherwise we’re not gonna be in the world where, I don’t want to just be the fifth or the sixth company that wins. I wanna look at a world where there’s lots of choice.
Starting a Foundation Model Company
Vibhu [00:32:57]: What else do people not see in starting a foundation model? it’s, there’s a lot of compute, there’s a lot of capital required, a lot of compute. You lay out model factory and how to do the training, but there’s a lot there, right? That’s,
Eiso Kant [00:33:10]: Well, look, it’s, I in turn-- this is an oversimplification, and I always asterisk it with that because it can land a little bit the wrong way in people’s minds. But I think you can sum down, And I saw it, 95% of model building to just doing, you’re just doing two things. You’re improving data or you’re improving compute efficiency. And I know that feels like an oversimplification for the incredible, like, Gifted and skilled work people do. But if you really look at it, like what are we doing? We are looking at data, we’re generating new data, we’re improving data. and the only way to do that is to look at the data, right? That’s a big part of foundation model building. And on the other hand, we come up with these incredible breakthroughs in inference, in architecture, and new attention mechanisms. But what are they really doing? They’re bringing compute efficiency. Now, we have definitely had some breakthroughs over the years that allow for more model capabilities. But at the limit, if you could train a large enough model, right, like, and you had infinite compute, we probably-- if you had infinite compute, you’d be at AGI probably already tomorrow.
Eiso Kant [00:34:12]: Right? Like it’s not. And so, and let me say that infinite compute with infinite ability of much faster networking because networking ends up being more of the bottleneck than compute. But, so I do think that’s, those are the main things. And to just realize that this is engineering. I think it’s become more obvious, but I think for quite a few years, people have held foundation model companies and researchers and others on this pedestal of like you’re doing incredible magic or rocket science, or only like, Nobel laureate physicists can do this. And don’t get me wrong, there are some really hard problems that need to be solved, but a lot of the work that all of us are doing on a day Is not sitting down trying to solve a math theorem. A lot of the work that we’re doing is just really doing the basics right, writing good code, looking at data, improving it, running experiments, looking at plots, trying to see like, hey, trying to shape our intuitions. And a lot more people could be highly capable researchers. and I think that’s, it feels far for people to do so. But I’ve seen in our own company, we’ve seen engineers become researchers because the model factory allowed them to be, have a much lower hurdle of running experiments and trying things. And one of the guys on our team who started as an engineer building our agents is a legit reinforcement learning researcher now, making real progress. and that happened in the span of like six months. that would’ve not been what I think most people assumed was possible, a couple of years ago.
Swyx [00:35:46]: Yeah. I think one of the interesting moments is when you can self-host, like, if in a programming language, like if you can compile the language in the language, the equivalent is can you use your own tools, right? You have the pool CLI, you have your own models. presumably you’re not only using your own models. There’s no way. But like, what’s that percentage over time?
Laguna S, Persistence, and Behavioral Gains
Eiso Kant [00:36:10]: This is the first model that we’re releasing that is starting to meaningfully contribute to our own work. It’s not a it’s not state-art model yet. Fable and other, they’re, they’re very capable models, but Laguna S Is really interesting. I’m gonna pull up the quote. Peng Ming, one of our heads of applied research, said something, last week as the model came out about 10 days ago, much better than we had hoped for or expected. And he said, I have the feeling that a lot of the gains in Laguna S come not from more intelligence, but more from different behavior, more verification, less taking things for granted, not declaring victory early, and being way more persistent. And to be honest, those are more predictive than raw intelligence for success in human also to some degree. And this was, he wrote me this on 5th of July on a Sunday, and it’s been burned in my brain ever since because the Laguna S model, as you’ll see it and why it does so well on benchmarks and why it does so well in using it on a day basis, is that it’s just incredibly persistent. It reasons a lot. I do call that out. We have work to do on making it more efficient. We have to work to do on offering different reasoning modes. But this is the model that has been able to do things that I never thought it could do. A hundred eighteen billion 8B active model, which is not that large. It fits on a DGX Spark and still runs at, thirty, forty tokens a second on a Spark, is able to solve Erdős 397 independently. It’s able to do complex programming tasks. It’s able to. I asked it this morning to make me a Fi scanner without using any external libraries on my Mac, and it’s, like, figuring out, like, the core WLAN API by really persistently trying to understand it without access to the internet. And more, I love vibe checking. I’ve probably spent eight to ten hours a day with this model for the last ten days.
Eiso Kant [00:38:05]: I’m not exaggerating. I was on my eleven-hour flight yesterday. I spent ten hours reading trajectories and traces and, like, of the model.
Eiso Kant [00:38:12]: And what I take away from it is exactly what Peng Ming said. We are gonna be able to squeeze so much more out of smaller models than I think we had imagined in the industry because, yes, there’s intelligence and larger models are more intelligent. Like, no doubt about it. We should continue to scale up. but the behaviors of being really persistent, of being able to backtrack when you’re wrong, of, like, understanding how to interact with your environment show us that we can get a lot more out of it. And this, for me, has created a bit of a Question in my mind the last couple of days. If you think about where we’re using models today, right? We are using models, say, for knowledge work. Represents twenty-five percent of the global economy, twenty-five trillion dollars of work.
Eiso Kant [00:39:00]: As we scale up models and they become more intelligent, we are excited about using them more and more for pushing the frontier of science.
Small Models, Knowledge Work, and Commoditization
Eiso Kant [00:39:08]: And if you look at the frontier of science, like true breakthroughs in science, they have been linked, they are linked to more intelligence in many places. Einstein figuring out general relativity is able to bring ideas together that other people would have not brought together. And I think one of the many dimensions of intelligence is the ability to do that, and it’s something we clearly see that as models get larger and more capable, they’re able to pull more ideas and threads together that a smaller model wouldn’t be able to.
Eiso Kant [00:39:36]: And we’re starting to see examples of that in medicine and, like, in bio and other things. But if you think about the majority of knowledge work that we do, and it includes building software. I’m a software developer at heart first and foremost probably, although I probably can’t say it that much anymore as I don’t write production code in years, is that what makes us good is our persistence. It’s our ability to encounter a problem and backtrack and say, “I need to go figure out this bug. I need to go research this. I need to go look at the documentation. I need to, like, try different, five different ways to see, like, if I can solve it.” But it is not necessarily bringing three ideas together from radically different fields. And so if we are now seeing, and I think Laguna S is an example, that we are able to make a relatively small model much more capable than I had definitely predicted or any previous, like, benchmarks had shown for any model remotely this size or even larger, At least on coding tasks, that it’s because of the behaviors. And so now the question I have, and I don’t have an answer, it is I know at the limit, so infinite model size, right, extremely large model, and the cost of that model is gonna be very expensive to run. We know this, right? So larger model ROI.
Eiso Kant [00:40:52]: So I know that at the very limit, I’m not gonna use the world’s largest model one day, quadrillion parameter, whatever crazy, like, scale we scale up, to do a basic coding task. Already today, I’m starting to size down for certain tasks.
Eiso Kant [00:41:07]: So it means that there is an optimal. It means there’s some curve that goes as we go up to model size for knowledge work, at some point we’re at the peak, and after that, the return on investment of using a bigger model, just doesn’t make sense.
Eiso Kant [00:41:22]: Now, I think the question is, before I would have thought that peak was extremely very far away.
Eiso Kant [00:41:30]: This model for me is the first sign that Maybe that peak is At a trillion, five trillion, ten trillion. Maybe we can just squeeze way more out of these models. I’m no longer thinking that we need two or three orders of magnitude on the largest models to be able to, solve knowledge work, the accounting, the legal, the code that we write. And so if that holds true, It is an argument for the commoditization of models. It’s an argument that open source can win and, like, succeed in this world. And now it’s of course a self-serving argument and it’s a hopeful argument, but theoretically at the limit it works. We just have to go discover in the next couple of years of how much more we can squeeze out. Now, I do want to put a big asterisk. This does not mean I’m against scaling models. I think we ultimately only succeed if we scale our models as large as our competition. I do not like. I think we should not put our head in the sand and say we’re gonna be king of open source small models. I think that’s, It’s a out. It’s trying to be king of your own kingdom, but not realizing what the rest of the world’s doing. All of us rather use a smarter, faster, more model. It’s a sign of hope. And so I don’t wanna overly state this is a good model. We have a long way to go to get to the state-art. But what hopefully people take away when they use this model is that the behaviors inside of it are what push it to be far more capable, less than necessarily the number of parameters.
Pre-Training, Mid-Training, and RL Moving Earlier
Vibhu [00:43:03]: Is that mostly post-training? Like
Eiso Kant [00:43:05]: Yes
Vibhu [00:43:05]: Right.
Eiso Kant [00:43:06]: It’s entirely post-training.
Vibhu [00:43:08]: Are we done improving anything on training? Is, like, training done?
Eiso Kant [00:43:12]: No.
Vibhu [00:43:12]: Okay.
Eiso Kant [00:43:13]: So
Vibhu [00:43:13]: I just wanted to cover training, and then we go post-training
Eiso Kant [00:43:15]: Training is not done. I mean, look, there’s a part of training of just dealing with skill, right? Every new order of magnitude of model skill, you are going to get new things you gotta solve for. That’- but those are ultimately, engineering challenges.
Eiso Kant [00:43:31]: I have a, I would say, a not commonly held opinion that reinforcement learning Will move earlier and earlier into training.
Vibhu [00:43:42]: Yeah, training.
Eiso Kant [00:43:44]: Not even training. Like training today, right, is, like if you look at - So we’ve been working on this for years already. and I think the best-- I think the first time we saw it out in public was the DeepSeek Zero paper. this is a year and a half ago, I think, if I recall correctly. where, you can Very early on in a model as it starts capable of being able to use language, et cetera, induce reasoning. and so the question that I have is like, we have this- we have the dataset that’s the web. and the web, I think we could arguably say probably has The totality of humanity’s knowledge somewhere encoded in different places. It’s a huge variance degree of quality, from garbage data, and like once you look at training data, you really get humbled of like what the web is, to like, the most greatest scientific papers and best blog posts and like, best transcripts and whatnot.
Eiso Kant [00:44:39]: And so now What we are trying to figure out, and have been doing a lot of work on, and it’s a place where maybe not as open as we’re on other things, but we will become more over time. we’ve been spending a couple of years really doing research on how can we turn the web into not just next token prediction, but into a way to teach the model to think earlier in its training. and I think there’s a huge amount of gold to be found there. I think we are right now in, we’ve got some drugs in the industry. One of the drugs is distillation. Another drug is, more environments. Like, and they’re great, and they make us feel good, and they make the models better, and like we’re all addicted to them, and we’ll use them, right? in various different ways. and but ultimately, I think we are still barely squeezing out of the web what we should be getting out of the web.
Eiso Kant [00:45:33]: I think just next token prediction during training is not enough.
Eiso Kant [00:45:36]: And
Vibhu [00:45:38]: Yeah
Eiso Kant [00:45:38]: I think we’ll see some very interesting things still happen. and that RL in post-training to induce behaviors, to improve things, like I think - the whole world knows how to do this now. I think we’re, we’re scaling it up. Everyone is. But I wonder if we need to go as far as we’re going today with environments. I’m not sure yet
Vibhu [00:46:01]: You mean we’re going too far?
Eiso Kant [00:46:02]: I’m, I’m not sure if the path to AGI is just
Vibhu [00:46:06]: Is more environment
Eiso Kant [00:46:07]: More environments.
Vibhu [00:46:08]: It seems like a never-ending, “Okay, I want instruction manual for this table, right? Am I gonna environment out building furniture? Or are we just gonna tail end like we need some general solution?”
Eiso Kant [00:46:19]: I think there is, I think there’s an ability to generalize more from the web. but I also am very encouraged, like when I look at Laguna S and, which is post-training is, well, is the big impact there. and I see like, oh, wait a second, just by making some of these behaviors much better, we’re able to get so much more out of it. It just changes a little bit the way you think about intelligence.
Vibhu [00:46:40]: Yeah. The analogy people draw often is the RL phase is where you don’t learn as much new knowledge. You shift
Eiso Kant [00:46:46]: Yeah.
Vibhu [00:46:46]: Yeah. So, you shift distribution, and you can have it reason towards what you want. on your point about training, a lot of training is still just continue training in a domain, say medicine, then you do RL. So still just
Eiso Kant [00:47:00]: It’s just better data, right? Like, I mean, training, ooh, I like how we invented this word. Like it’s effectively just like,
Vibhu [00:47:06]: Second phase
Eiso Kant [00:47:07]: It’s the second phase of training With like a really dumb way to do a curriculum. But like ultimately, what you’d want is a curriculum from token zero to token 30 whatever or 40 trillion tokens that really truly is the optimal curriculum for the model to learn. But training is essentially a stage curriculum on the web because we do not have to compute, And, effectively to try to ablate the perfect curriculum, right? And so I’m pretty sure that you’ll start to see people talking soon about some other term, and there’s two or - ‘cause now we do this, right? We talk stage two and stage three and stage four training and like. But ultimately, all we’re doing is we’re trying to assign a curriculum to the web data that we have to allow the model to learn better. I think at some point, as things get compute, as models get cheaper to run, as the next generations of compute, this will become more of a continuous spectrum. I also think the reason, by the way, you have training and like stage two and stage three is organizational, Right? It’- this is, I think, a thing where-- that we really try to avoid with the model factory is like Training exists because there’s a training team now, right? There’s people, or like people in training decide to focus on like a training effort. but what you really want is engineering and scale of experiments that allows for a much more continuous spectrum that you don’t, you have infinite stages. Now, we’re not there. Compute’s not there. Organization design is not there for it yet. but I think we’ll get there. we’ll look back on a couple of years and be like, “Oh my God, it was so cute that we did our training data like this in such a like naïve way. Like we barely ordered it. We didn’t really do a good job at like
Curriculum, Auto Research, and New Objectives
Vibhu [00:48:48]: The building that curriculum will get you that in the industry.
Eiso Kant [00:48:51]: And I’ll confirm that, when I talk to some researchers that this is a lot of the focus now is like how does training change and what is the next objective other than, next token prediction. I assume you don’t have the answers, but you have some ideas.
Vibhu [00:49:02]: We have some ideas. We’re not ready to talk about it yet.
Eiso Kant [00:49:05]: Yeah.
Vibhu [00:49:05]: We’ve been working on them for years, and I think that’s the one thing that’s also like you asked earlier about, like what’s not obvious about building a foundation model company is that you are constantly balancing the table stakes work, the recipe works
Eiso Kant [00:49:19]: Yeah.
Vibhu [00:49:19]: Versus like your, my crazy
Eiso Kant [00:49:22]: Pure research
Vibhu [00:49:22]: Breakthrough.
Eiso Kant [00:49:22]: Yeah.
Vibhu [00:49:22]: Pure research and finding that balance and adjusting the percentage to it based on where you are in the race is really important.
Eiso Kant [00:49:31]: I mean, so like, this is a nice way. I was gonna bring up auto research at some point
Vibhu [00:49:35]: Yes
Eiso Kant [00:49:35]: As another Andrej invention, or coinage, which is like, I honestly, like how many objective functions can there be, right? Like just try 1,000 of them, set it running, whatever.
Vibhu [00:49:47]: Man, it’s also
Eiso Kant [00:49:48]: Like what you’re looking for. You’re looking for loss curves like that, like
Vibhu [00:49:51]: It’s also a thing people take bets on, right? When you say more Neo labs, you’re doing a version of we’ll do foundation models, scale them up, next token predictors. A lot of other Neo labs that we see want to take a completely different approach, right? At some level, you’re right. It’s all, compute efficiency, and that’s the net objective. But some are okay, different architecture, like vastly different amounts of compute spend. So some are different. They’re not just
Eiso Kant [00:50:19]: Yeah
Vibhu [00:50:19]: They’re like, 99% not balancing, here’s the vanilla and scale up. They’re 99% on, here’s novel research that’ll change everything.
Eiso Kant [00:50:27]: And I think, Luke, I think you. It depends when you started as well, right?
Pure Research vs. Table Stakes
Vibhu [00:50:30]: Yeah.
Eiso Kant [00:50:30]: When we started, like the novel thing we did was reinforcement learning on code. No long- that’s no longer novel by far, but we were like, - that’s where we obsessed over when no one believed in RL. So you have to when you start the company, you have to have your own idea. You have to have something that’s different that allows you to speed up, right? For us, it was RL to LLMs that later became common, like, Knowledge. But in the beginning, it wasn’t
Vibhu [00:50:53]: It’s cool. this was like your original 2023 blog
Eiso Kant [00:50:57]: Yeah
Vibhu [00:50:57]: Of purpose.
Eiso Kant [00:50:58]: Yeah.
Vibhu [00:50:59]: And like you do lay it all out here.
Eiso Kant [00:51:01]: We laid
Vibhu [00:51:01]: The blog is pretty underrated, right? The whole RL on code was very early on.
Eiso Kant [00:51:06]: Very early. And even we had to argue with people, like we say here things like to push beyond current capability, to train your own foundation model. We had to argue with people that it mattered that you had your own like, base model. you can fine-tune your way to success, right? major capabilities emerge from training a base model made accurate and useful during fine-tuning.
Vibhu [00:51:23]: Which like, for perspective at the time, we knew closed models, OpenAI, Anthropic were huge. The open models we had were like Mistral 7B, a 30B, a 70B.
Eiso Kant [00:51:35]: When we
Vibhu [00:51:35]: Yeah
Eiso Kant [00:51:36]: The date on this thing is wrong. When we published this, it was April 2023. I think this was just
Vibhu [00:51:42]: Yeah
Eiso Kant [00:51:42]: Happened on a migration, probably found it on archive.org.
Vibhu [00:51:45]: Mistral.
Eiso Kant [00:51:46]: Mistral had started, we started on the same month, right?
Vibhu [00:51:49]: Yeah.
Eiso Kant [00:51:49]: So this wasn’t even, there was only, I think, Llama out at the time
Vibhu [00:51:52]: Snell
Eiso Kant [00:51:52]: And that’s it, right? And so, but I agree. I think we want, We want as many diversity of ideas, and I do think if you’re starting today, you want something that gives you an edge, right? and what I do think we sometimes over.
Eiso Kant [00:52:13]: I think every archit- like at the limit, every architecture works. An RNN works, it’s just not compute efficient, right? Like, say if you had infinite compute, you could probably just, take a basic RNN from back in the day, and you could get pretty far.
Eiso Kant [00:52:27]: Now there have been, meaningful breakthroughs, attention, other things that are there. but I think we’re still, we’re still very early in figuring these out. The things I’m most excited about, I’m most excited about people doing extremely low precision training, right? So like the ternary stuff that we’re seeing, and it
Vibhu [00:52:47]: Oh my God
Eiso Kant [00:52:47]: Very cool. The Bonsai stuff yesterday was super cool to see. I think that if you can find tweets from me going back to 2023, which is like the notion of like, well, it’s an obvious trade-off. Bigger model, lower precision equals, smaller model with higher precision, by definition, right? It’s just what is, like how does that play out, right? What’s the actual size limit? So you now have companies that are trying to figure that out, but those are the things that can change our industry if they’re done right.
Low-Precision Training and Compute Efficiency
Vibhu [00:53:14]: Yeah.
Eiso Kant [00:53:14]: Because ultimately, like our bottleneck on compute is a MatMul bottleneck, and a networking bottleneck, and the moment you start doing those things. So I’m excited about that. We’re not doing - I mean, we’re doing the usual, like, Laguna S was trained in FP8. only thing that in this run I have to admit that wasn’t FP8 was the all to all in the new run we just started yesterday. The FP8 was all to all. That was just like cut off date, like, oh, we’re not perfectly comfortable wanting to do it. you’ve got amazing work by Nemotron and NVFP4 training. Like, I think it’s underrated what they’ve done there. I’m excited to get to NVFP4 training. doesn’t make sense yet ‘cause we’re still training on Hoppers, right? We’re like relatively small. We’re 10K H200 cluster company right now. We’ll be scaling to a lot more soon, but, and really a lot more if someone is thinking about applying for a job. but like the. Yes, I think it’s, there’s so much more juice to squeeze out of this, and hopefully Laguna S shows people that a model at this size can get a lot more and we did this thing in eight weeks. We think there’s a lot more juice to squeeze out at any model size. we’re now scaling up because it’s the most optimal thing to do for us as a company. But if I had infinite time, I would love to push more the capabilities at other model sizes.
Vibhu [00:54:34]: I don’t think we’ve properly announced what your new size is. So we have XS, which was 30B-ish.
Laguna S Model Size and Naming
Eiso Kant [00:54:41]: Yep.
Vibhu [00:54:41]: Old medium was 200B, which is gonna be deprecated
Eiso Kant [00:54:45]: Yeah
Vibhu [00:54:45]: It seems. So new Laguna Small
Eiso Kant [00:54:48]: So Laguna S, Laguna Small, 118 billion total parameters, 8B active, so very sparse. It’s a scale-up of the XS architecture. It’s the classic, or call it classic these days, like three-one ratio of sliding window attention to global attention. It’s just, it’s a nice size, for a couple of reasons. One, it’s just very cost efficient. For us, it was a good way to - We wanted to get our progress out quickly. One of the things that we’ve seen is that it’s a balance inside a foundation model company between focus on releasing and shipping And, like, your new novel research. But with the model factory, we are able to, like, treat the release of a model as less of a time investment from the team because it’s just, oh, at this moment in time, do the training run, done, apply the latest post-training. And so this is, I think, a nice weight class. It’s one that also will fit on a DGX Spark, which, I have a small, like, soft spot for. I love having that little thing, like, run a good model.
Swyx [00:55:52]: Yeah, we covered it on this pod, GTC last year.
Eiso Kant [00:55:54]: Nice.
Swyx [00:55:55]: I think a OSS 120B was the first because it’s a large single GPU, which was the H100, right?
Eiso Kant [00:56:02]: Exactly.
Swyx [00:56:02]: Rent one H100, now you’ve got 128 gig Macs, Mac Minis, Sparks. It’s, it’s the home sweet spot.
Eiso Kant [00:56:10]: But I think what I’m most excited about is that this model hopefully shows people what is possible in this size because, when you’ll look at the benchmarks and start using it, you’ll realize that we are outperforming models two or three times their size.
Swyx [00:56:24]: Yeah, and they think-- So for example, today’s Thinky model is like a trillion params.
Eiso Kant [00:56:28]: So yeah, exactly. And look, and by the way, I’m excited about-- I have-- It just came out, so for those of you who are listening to this at like, I saw it on my phone
Swyx [00:56:34]: If you’re, if you’re listening
Eiso Kant [00:56:34]: If you’re humming
Swyx [00:56:35]: Yeah.
Eiso Kant [00:56:35]: Like two seconds, so I haven’t even had a chance to read the post.
Swyx [00:56:39]: But somehow you are, not only you’re, you’re better than Thinky, which is like one of those benchmarks, but also, like, on certain benchmarks, like the τ-bench one, like you’re state-art.
Eiso Kant [00:56:51]: We’- Look, we’re doing, I’m not sure if we’re state-art on I mean, 3 banking, I haven’t checked where we sit on the leaderboard. but I think we are, within our weight class, I feel very comfortable to say, and even in some weight classes twice larger, that we are probably state-art. I also want to caveat this, like, best model still in the world right now is definitely, give me a Fable, give me a 5.6. To your point earlier, we also use other models.
Swyx [00:57:15]: Yeah.
Swyx [00:57:15]: I think the, so the interesting thing you mentioned earlier is you’re starting to shift a lot of your actual usage to it, right? Benchmarks are like
Eiso Kant [00:57:21]: Yeah
Swyx [00:57:22]: They’re good to compare, but they’re not super realistic. It’
Eiso Kant [00:57:24]: They have to, right? This is how they’re gonna dog food benchmarking.
Eiso Kant [00:57:27]: No, you have to. Like, you have to use your own models, and you have to have your own internal evals and benchmarks. And what the funny thing is, like within first 30 minutes of a new checkpoint coming out that’s, the first post-train after a train, you yourself can feel in the first 30 minutes of where this model’s gonna be. Like, you don’t know exactly, but like when this one came out, we were like, “Oh,” like, “this is different.” Like, and I think that’s, I think that’s the best example. but it’s a little bit like your kids. I don’t have kids, but parents, like, they see their kid and it’s perfect and they love it, and then like, they don’t see all the rough edges. You always get that when you build your own model. It’s the most fun part is that you, like, you love a little bit every model that you do. We try to say this thing constantly, it’s like, “It’s the worst model we’ll ever train.” And so I know the team now is like already onto
Swyx [00:58:18]: Yeah
Eiso Kant [00:58:19]: The next one, as it should be, because this is a race. and this model is a moment in time that hopefully shows people that we are serious about this race, that we wanna work really hard at it, that we want feedback, right? Where is it good? Where is it not? Like, one of the nice things about having your models out in open weight and out in the world is that you get a lot of feedback.
Swyx [00:58:40]: How do you think about building it with like, working with a harness, right? So OpenCode, Codex, you have your own pool CLI tool. getting people to use it, the design of model harness, training it in.
Eiso Kant [00:58:54]: So you need to do some multi-harness training. Like if you, especially at these smaller sizes, like you wanna do a little bit of multi-harness training for these models to just get the right. And it’s very little. Like, you don’t need a lot, but it’s just like to get the right behaviors that you see in your harness transferring to the harness that, like, you, other people might use it in. we internally have been just calling this polishing, which is like you’ve got your model and you do a little bit of polishing so that, like, it’s able to work well in other harnesses as it is in your own.
Eiso Kant [00:59:24]: No doubt it’s going to be better in your own harness, and it’s just because of like where are you putting your reinforcement learning compute, right? You’re putting your RL and your synthetic data, you’re putting it to your own harness because it’s the one that you understand the best and you’re able to push the most. because that end control is what allows you to make it better. then transferring those capabilities is more about just making sure the model, induces the right amount of reasoning and like, understands some of the maybe more complex weird tool call formats that might exist somewhere else. and so we do some multi-harness polishing, as we call it. it’s not really what drives capabilities, but it does create a better experience. I think everyone probably does these days, but it is totally fair to see why your own harness is going to still be better than others. And I think we see this with all the foundation model companies. and it’s just that when you are pushing capabilities, you don’t really wanna trade it off by putting 10 harnesses in your RL runs because it’s just complexity. It’s complexity of engineering because these-- When you’re trying to do good science, right, when you’re trying to really understand what made my model improve, you wanna make one variable change to something you understand. And a harness from someone else, you don’t know or understand in the same way as you understand your own, right? They might have different agents or different prompts
Why Poolside Is Called Poolside
Swyx [01:00:48]: Yeah,
Eiso Kant [01:00:48]: In different places
Swyx [01:00:49]: If it’s open source, you can look at the source.
Eiso Kant [01:00:50]: Yeah, but it’s time, right? Like I really cannot stress, like I know I’m like a weird person on this because like I have friends like, “Can we meet up?” Or, “Can we do this?” Or, “Can we go out?” I’m like, “No.” Because ultimately, this is a race, and time is the only thing that matters. And if I look at our team and say, “Okay What is complexity worth introducing on our general trajectory to building more capable models? Which generalized to other harnesses quickly. And by the way, our model works well on other harnesses. I really encourage people to do it. It works well. Like I’we’ve been testing it in OpenCode and Kilo Code and others and like, and in Claude Code.
Swyx [01:01:22]: Which just got bought today.
Eiso Kant [01:01:24]: I saw it.
Swyx [01:01:25]: I mean Honda. Yeah.
Eiso Kant [01:01:25]: Exactly.
Swyx [01:01:26]: Everything’s getting bought.
Eiso Kant [01:01:27]: Exactly. and I think part of that is like, and there’s some amazing. I’m, I’m excited, like I think Hermes is a ridiculously cool harness like, and
Swyx [01:01:37]: And, part of the question was just like how much of it is model versus model plus harness, right? So new benchmarks like Agents Last Exam, it’s not wanting to just measure the model. same with models getting more and more agentic. They need a harness to operate in, right?
Eiso Kant [01:01:55]: I think for when you’re asking that question to a model company, I think you can separate it in two parts, which is like The harness, like we have a very slimmed down harness. When you look at it’s like six tools. It’s like shell and like shell kill, shell wait, write, fetch web, and like, I don’t know, bash. Like I think I’m missing one, but like that’s effectively all the tools. And it’s very simple. It’s very lightweight. So it is not a harness that is designed to try to do well on a benchmark or try to do well on a certain subset of things, right? It’s not a deep research harness. So I think we see incredible ability for complex harnesses that build lots of prompts around and extra data sources and other tools to really push capabilities of models forward.
Eiso Kant [01:02:41]: But our model is still better than some other harnesses who do that in coding-like tasks because it was RL’d with it.
Eiso Kant [01:02:48]: Now, I do encourage people, I think our model, by the way, is perfectly fine and good on ours. The differences are probably maybe too small for anyone to notice, but we see it ultimately still on benchmarks, by a little bit. So I think it’s both are true. Foundation model companies with their harnesses will really push them because it’s just operationally, the best way to have scientific rigor in improving your models. But also someone who takes our model and really does a lot of work on improving a harness is going to compete us, as they should. and that’s just because the harness is the stopgap between what the model is capable of And what it needs as additional instructions, and what it needs is access to data and tools, right? And that’s ultimately, I think, what a harness is. It’s like, is it able. As you build more capable models, you’re improving the instruction following the models. And so additional harness is just saying, “Hey, if you encounter X, Y, or Z, behave this way.” And so even if you would say that two models with two different harnesses can equally reach the same capability that you care about, a harness that is really tailored towards a capability will do it more efficiently.
Eiso Kant [01:03:58]: It’s like a person who’s getting a manual of how to do the task in the most efficient way with the right tools and the right data sources versus a really smart person like, “Go figure it out.” They’ll both solve the task, but one will do it a lot more efficient. So I’m a big fan of all the harness development that’s happening in the world, and we want to work with more harness like creators to also make sure that like if it needs some additional training, like publishing, that we will do it.
Swyx [01:04:22]: I mean, I think when you say it’s a race, there’s a question of what are you racing to? are you racing to be the best coding model company or the best coding model plus harness company? I think that’s a, those are different things.
Swyx [01:04:36]: Or neither.
Eiso Kant [01:04:37]: Or neither.
Swyx [01:04:37]: Yeah.
Eiso Kant [01:04:38]: So we. I race to AGI. Coding for us since day zero of our website has been, and we’ve said this over and over again, we think focusing on coding and long horizon like software tasks is a path towards AGI because it forces us to solve the hard problems. It’s, it forces us to solve the ability to do extremely long horizon complex work that requires lots of reasoning, external tools, data, et cetera. And one of the things I can show you, so we’ll, we’ll have a web chat on with this model, and I’ve loved this model for deep research, just using it in my coding harness. It was never trained for it. It was never like looked at it, but it’s great at it, in my opinion. because ultimately, the skills transfer, they generalize. Now, where we are not focused on today is to make sure that the world’s greatest medical knowledge is encoded in this model or the world’s greatest legal knowledge. But it did. We won’t be publishing this benchmark ‘cause we didn’t have time to really do proper, but it did really well on LegalBench. and at least on our first runs, and we are very rigorous. When we publish evals, we have Checked them for every little thing. We have run them many times. We’ve passed, like we’ve gone and we’ll, like we try to be extremely honest with this, so if we haven’t spent enough time on a benchmark that we use internally that is public, we just say that we won’t publish it. and
Swyx [01:06:01]: I mean, the other way is just to give it to artificial analysis and let them run it.
Swyx [01:06:04]: Like third party standards.
Eiso Kant [01:06:05]: Oh, 100%, and we are gonna be doing this as well. And still it takes time and effort, right? Because you’re working with people to understand like, the infra failures and like the tools they’re using and like, are they set up well. But I agree. You absolutely want to. I’m a big fan of companies like Vals and Artificial Analysis and like others that are doing this stuff.
Swyx [01:06:21]: I found it very nice. You’re the first to bring it up.
Eiso Kant [01:06:22]: Yeah. I think they’re great. They’ve got like. I loved like a lot of the work they’ve done and put out. and so, and there’s, I think, many more, and please create more eval companies. Like create more evals. I think it’s so valuable for the industry.
Swyx [01:06:34]: It’s an actual monopoly I feel like. Oh, and duopoly maybe.
Eiso Kant [01:06:37]: I think it can be broken.
Swyx [01:06:39]: Yeah.
Eiso Kant [01:06:40]: Because I think it can be broken really easily because creating an eval for many people isn’t sexy work, but whoever does it, everyone is happy to get a good eval. You’ve like if an eval is well constructed, everyone’s celebrating it, and everyone’s willing to pay for it, and everyone’s willing, like the foundation model
Swyx [01:06:55]: Oh, yeah. I think creating eval, yes. But like in terms of being like we are the industry standard ones that will
Eiso Kant [01:07:01]: Yeah
Swyx [01:07:01]: Τ-bench and make sure that you didn’t, you didn’t cheat
Eiso Kant [01:07:03]: Yeah, that’s true
Swyx [01:07:03]: And I’ll run it the same way that you run it versus your competitor run it.
Eiso Kant [01:07:05]: Yeah. That is very true, and we need that. And it’s nice that’s like a few standard places that we all have to like, adhere to. It keeps us all honest. I think that’s super important to do so, And, but yeah, no, I think our goal is to build the world’s most capable models. and right now we are focused on the coding agent capabilities, long horizon work. But what you see with that is that you get a lot for free. I’ve always said it’s a lot easier for us as we get to SOTA and frontier on coding to then say, “Okay, now we’re going to obsess in using the model factory to add more data for places that, we’re not as strong on,” like could be medical or legal or any other areas. and similarly, I think what we see, and we see this with reasoning models a lot, if you give models access to the right knowledge sources and they have capable ways of reasoning, they’re able to go very well into domains that are less known to them or even seen less in their training data. So, but yeah. Are we a agent like model plus harness comp-? No, we’re a model company. but I think models today cannot be trained without harnesses. It’s not possible. So it is just like where before it was just the weights in the container, well, now there’s an agent harness that’s attached to it. and but I think there’s a big difference in being an agent harness as a model company than someone who’s truly building an agent company. I think they can do far more than we can.
Swyx [01:08:27]: Yeah. understood. Yeah. I think that is my minor pushback. If you are truly identified as a model company, then make the best model for OpenCode, right? Instead of for pool or whatever. I think that’s not as, that’s, that’s minor compared to if the goal is AGI, make the best model for Hermes.
Swyx [01:08:45]: Right? Like just ‘cause that is the next stage after coding.
Eiso Kant [01:08:48]: I’look, and we’re working like very closely with them
Swyx [01:08:52]: Yeah
Eiso Kant [01:08:52]: Because I do think like it’s, and, you have to care, you have to invest in it. It’s why we do the polishing and we spend time on it. and I think over time, yeah, you’re, you’re right that you wanna balance that out. but ultimately you just want general capabilities that everything works equally in every harness.
Swyx [01:09:10]: Just on the topic, do you guys do much with like Hermes, OpenAI Codex, NanoCodex, whatever? Pi?
Swyx [01:09:16]: Pi.
Eiso Kant [01:09:17]: Pi.
Swyx [01:09:17]: No, Pi is different.
Eiso Kant [01:09:18]: It’s more coding.
Swyx [01:09:19]: Yeah.
Eiso Kant [01:09:19]: I’m a big fan of Pi, though, I have to say. I think it’s a really sexy
Swyx [01:09:22]: I forgot to mention Pi.
Eiso Kant [01:09:23]: Yeah.
Swyx [01:09:23]: Pi, you sound closest to Pi in terms-- pool and Pi in terms of like the minimal surface
Eiso Kant [01:09:28]: In the minimal yeah.
Swyx [01:09:29]: Yeah.
Eiso Kant [01:09:29]: It’s because I don’- I have a. Allow me for one more strong opinion.
Swyx [01:09:33]: Yeah.
Eiso Kant [01:09:34]: I’ve been saying this now for two years.
Eiso Kant [01:09:37]: I think MCP and tools are stupid.
Swyx [01:09:41]: Ooh. Let’s go.
Swyx [01:09:42]: You support MCP.
Eiso Kant [01:09:43]: I support MCP and we support tools and everything. They make absolutely no sense to me.
Eiso Kant [01:09:48]: And like, and I’ll explain a little bit why and I think I can probably get people to come along on this one.
Eiso Kant [01:09:56]: If you are looking for complex tasks, increasingly longer horizon, increasingly complex tasks, doesn’t matter if it’s coding or something else, You are gonna be interacting with data sources, right? And you’re gonna be interacting with things that are installed on some form of a virtual machine.
Eiso Kant [01:10:15]: And what we are doing is that we’re putting a layer in between those things. We’re putting like MCP in between, we’re putting tool calls in between, and this is even more about tool calls than MCP, where the model can just write the code and interact with the system. And we’re starting to see that. Like Laguna S does this a lot. You’ll see this as well in like frontier models. They’re increasingly no longer, “Here we’re gonna stuff 50 tools in the like system prompt,” to “No, here’s a virtual machine with these binaries installed, this code base you can operate in. Here, a folder where you can write, your memory if you want to.” And the model is using code to do complex asks. And when it uses code, it is not one or two tool calls or three things that are chained together. It starts, using if statements and for loops and making things conditional. And so I think we’re moving from, we already are moving from tool calls, to effectively models writing code, little scripts, and you see this a lot when you get the Python,
Swyx [01:11:15]: Code interpreter.
Eiso Kant [01:11:16]: Exactly. Like in just the arrow in, written code in the file. I don’t know what you call
Swyx [01:11:21]: EOF? Yeah.
Eiso Kant [01:11:22]: Yeah, exactly. Like, you already see this happening more in models because when you start training them in RL, the models wanna be free. They wanna be able to do the thing they wanna do in the most efficient possible way, and it is not calling one of the 50 tools in their like system prompt. And so I’m a very big fan of Give the model a minimal harness, as minimal as possible, give it a container in which it has its own code base, right? The, got a models code base that has access to the API keys and data sources and little libraries and documentation that it needs, and just let it run free at the task. and I think that is the way we’re going. I think we will, in 12 months, not see a single system prompt that is stuffed with 20 or 30 or 40 tools anymore.
Swyx [01:12:07]: No comment. no pushback there. I think there will be, it’ll be supported for a long time just because that’s, a lot of people are trained on that now, but maybe you guys don’t have to support it in your models, going forward. So, but yeah, I mean, if you can. I do think that’s, writing code is more generalist and it’s a, it’s a means to an end
Eiso Kant [01:12:26]: And we do support tools.
Swyx [01:12:27]: Yeah.
Eiso Kant [01:12:27]: We support. And this is the first model we’re doing parallel tool calling in which we needed to catch up on. So like that’s there and like
Swyx [01:12:32]: Yeah
Eiso Kant [01:12:32]: So it’s, it’s there, but I,
Swyx [01:12:35]: Yeah
Eiso Kant [01:12:35]: It’s a personal, nitpick. I like, I want the models to have as many degrees of freedom and just like, be free and do capable things.
Swyx [01:12:43]: Yeah. So and then, so that was on the path towards like, okay, how do you use Poolsides models and Laguna models for my Hermes or my OpenAI Codex
Eiso Kant [01:12:52]: Yeah
Swyx [01:12:52]: On all those things. And so typically what I look for is, Computer use or vision. That’s a, that’s a very big one. You guys have a blog post on that. but then also the persistence I think is very strong value, as well as long context, which you guys have a million token context. Anything else?
Eiso Kant [01:13:08]: So for us, look, so for us, vision understanding is the next thing, right?
Swyx [01:13:11]: Yeah.
Eiso Kant [01:13:11]: Like we don’t have vision understanding.
Swyx [01:13:12]: Which I was gonna say is
Eiso Kant [01:13:14]: We don’t have vision understanding in these models yet.
Swyx [01:13:16]: Yeah.
Swyx [01:13:17]: To
Eiso Kant [01:13:17]: And so this is something that we’ve, we’ve started efforts on. Like we think it’s, it’s super important to have visual understanding.
Swyx [01:13:23]: That’s company vision.
Eiso Kant [01:13:24]: And so no, we’ve got work to do there. and this is one of the things I loved about the Thinky model, like from the Two minutes I scrolled the blog post
Swyx [01:13:33]: Yep
Eiso Kant [01:13:33]: Multi, the multi
Swyx [01:13:34]: They’re, they’re very committed to multimodal, including audio. Yeah.
Vibhu [01:13:36]: They’re state-art audio, as much as it’s a trillion parameter state-art audio, but also all trained from scratch, right?
Swyx [01:13:43]: Yeah.
Vibhu [01:13:43]: No encoder in the sense
Swyx [01:13:45]: To me, that’s, that’s, that’s one of the strongest reasons why you need to train from scratch, is you just have a different tokenizer, you’d have different
Eiso Kant [01:13:51]: I’m fully aligned, like zero disagreement from me here. Like, just add the modality and don’t put. keep it simple. we’I don’t think we’ll touch audio for a very long time.
Vibhu [01:14:05]: It’s in the name too, InkLink Inc.
Eiso Kant [01:14:08]: True.
Swyx [01:14:08]: Yeah.
Swyx [01:14:09]: I mean, what’s so hard, what’s so hard about audio?
Eiso Kant [01:14:11]: It’s not about what’s Again, it all comes down to focus.
Swyx [01:14:14]: I see.
Eiso Kant [01:14:15]: Right? Like saying no to things means that there’s a research or an compute that can go to making general progress, and our view is like general progress, is going to come from the ability to push these models to far more capable reasoning, far more longer horizon tasks. I don’t think audio Adds to that. I don’t think it pushes us close to AGI. I think it is a necessary modality as you get closer to AGI. I think visual understanding sits in the middle of those things. I think visual understanding can absolutely, do so, but it also unlocks capabilities that are just valuable today. so but this is the point, right? You want more diversity, you want more different foundation model companies who focus on different things. I think we are just like a horse with blinders on, just like
Swyx [01:14:58]: Yeah, you have your path
Eiso Kant [01:14:59]: We have our path, we wanna catch up to the frontier, and, we don’t wanna distract ourselves with anything else.
Swyx [01:15:05]: Yeah.
Swyx [01:15:06]: I will call out that one of the branches of research is DeepSeek OCR, which is can you just throw away the text tokenizer and just have only vision?
Eiso Kant [01:15:13]: I find this-- I look, geek, the geek in me is like looks at this stuff and it’s like, okay, look at this, like look at the number of bits encode
Swyx [01:15:20]: But they’re right.
Eiso Kant [01:15:21]: I think it’s super cool, right? But I think this is what we’re gonna come back down to. Like probably works, it’s just is it compute efficient enough? Is it Like I think so many of these things ultimately will work. It’s just like, what’s the nice thing about text? And I referenced earlier, Peng Ming and Nikolai are my two heads of applied research who are just incredible, like we wouldn’t have gotten here without them and the entire team.
Eiso Kant [01:15:45]: And Nikolai have-- and I have been debating, for years about like, should reasoning be in latent space? Should reasoning be in tokens? But one thing that I think him and I really agree on, and all three of us, and is that like Language is incredible because it’s such an incredibly dense way to encode knowledge and information and intelligence, right? If you think about like what went into a physics paper that then is, 20 or 30 pages, like the amount of intelligence and thought and whatnot to then generate that, like in that 20-page document, like those little amount of bits, there’s so much encoded. And other modalities like video and images are amazing, but they don’t have the same density of like knowledge or reasoning or however, like the things that we’re trying to push for that are encoded in that modality. They’re there. In many cases, you can watch an incredible lecture for, 50 minutes on YouTube, but the amount-- and but if you treat that as video in data versus text data, right, the bits to like signal-noise ratio, the compute efficiency of the modality is a lot less. And so we have this view as like with language you can go really far, but also when you have limited compute, limited, people, and they’re very much linked to two, I think we can push language. It’s the more, it’s the better investment. But I want all the modalities. I find it super cool and I love what DeepSeek and others are trying. Like I can retweet them all the time, but internally we’re just like, “Let’s stay focused.”
Vibhu [01:17:17]: Which I’ll say, you can see somewhat works looking at Anthropic. OpenAI has a lot of vision, multimodality. Anthropic just didn’t, right? Fable’s a big step up in image processing, but like they’re not known as the multimodal company, right? They’re the language model coding company that has multimodal capabilities that’s never super flex and, goes pretty far.
Eiso Kant [01:17:42]: I look, I in this I think Anthropic, I mean, they’ve done many things right, but I think this maniacal focus on just pushing capabilities, scaling up models is. I couldn’t agree more. I think it’s, it’- that’s the first hurdle, and once we get that, then we can improve a whole bunch of other things. and but at the same time, on the other end of the spectrum, it’s really exciting to see people, building these spatial models, right? That are, and the world models that are being built, like for very different, use cases. but I think ultimately it all comes together at some point.
Vibhu [01:18:19]: Okay. So scaling models, this is Laguna S for small.
Eiso Kant [01:18:23]: Yes.
Vibhu [01:18:23]: You have good naming, extra small, medium, large.
Eiso Kant [01:18:26]: Yeah.
Vibhu [01:18:26]: Still scaling?
Eiso Kant [01:18:28]: So the new medium started training, and it’s much bigger than the last medium, started training yesterday. so it’s a, 39-day training run. and,
Vibhu [01:18:39]: How do the days and events? Just the compute model
Eiso Kant [01:18:41]: Models factory.
Vibhu [01:18:42]: Okay.
Eiso Kant [01:18:42]: Right? And like at this point, like with the model factory, like it’
Vibhu [01:18:46]: I thought it was interesting. So in the Laguna medium and extra small, you even quoted number of GPU hours for how many days and whatever for different size. And I’m like, “Oh, you can also work backwards to how much that costs, right? What GPUs, how many hours “
Eiso Kant [01:18:59]: And you realize it’s not a lot.
Vibhu [01:19:00]: No, it’s not.
Eiso Kant [01:19:01]: It’s not a lot of money. and, you started with DeepSeek of the West and, I think that’s, The DeepSeek moment, right, was a moment when people realized that you can train incredibly capable models for not a lot of money on the training run. But I think that’s the falsehood, right? Like the training run is not the expensive part. The training run is a very anticlimactic event, right? Like we just had a Slack message come up yesterday saying, “The new model is training and here are the links, so you can follow along the evals,” and like that’s it. all the work that goes into that moment, it’s like how people talk I know nothing about sports, but how, like, athletes talk about, like, it’s all the preparation, it’s all the going to the gym, and then the game is just a game. I think that’s a little bit like with model training.
Swyx [01:19:42]: Yeah. People had over-indexed on DeepSeek was trained for $5 million or whatever it was, right? It’s like there’s the amount of R&D before that, the infrastructure is built up. Yeah.
Eiso Kant [01:19:51]: Exactly, all the things, the data. But no, so Laguna M is training, and yes, there will be an L and there will be an XL, and what you’ll
Swyx [01:19:57]: Ooh.
Eiso Kant [01:19:57]: What you’ll see with M, right, M is much larger than the last M, right? So these monikers are a little bit our version of the different
Swyx [01:20:04]: Yeah, he was making fun of people for saying small is 24B or something.
Swyx [01:20:08]: No, so, no. Small for Mistral now is over 100B.
Eiso Kant [01:20:12]: What?
Swyx [01:20:12]: Yeah, I can pull it up.
Eiso Kant [01:20:13]: I mean, our small, right, is 118, so I don’t wanna say anything else. Like, it’
Swyx [01:20:17]: I mean, I think it’s also. Okay, yeah, your small is
Eiso Kant [01:20:20]: We all know that the single hardest thing for any foundation model company is naming.
Eiso Kant [01:20:25]: I don’t want to say that we’re good at it either. I mean, it’this is Laguna S 2.1. It’s, it’
Swyx [01:20:32]: But at least people understand, medium is bigger than small. Until you mess that up, like
Eiso Kant [01:20:37]: Exactly
Swyx [01:20:38]: You have a pass.
Eiso Kant [01:20:38]: We try hard.
Swyx [01:20:40]: While we’re on the topic of naming, this is gonna be at the end, but might as well
Eiso Kant [01:20:43]: Sure
Swyx [01:20:43]: Why Poolside? Why Laguna?
Eiso Kant [01:20:46]: So When we started the company, it was gonna be called Snowball Apps. it was after the snowball effect because we expected this company to become a snowball effect, and it definitely has been a snowball effect for us. turns out it’s an Amazon trademark.
Eiso Kant [01:20:59]: I kid you not that my founder’s next suggestion of a name was, “Let’s call it Bedrock.” And so at this point it was like, “Okay, no, you are amazing at naming things if you worked Amazon.” and so, early on in the company, before we were incorporated, we were at an annual conference of a very big Major tech company, and we had been discussing with them. And you have to realize the company at this point is me, my founder, our CEO, Margarita. We know the first person who’s gonna join us. We haven’t, like, incorporated yet. and we were discussing an OpenAI Microsoft-style deal with this big tech company. Like, they were going to provide us with a lot of compute. We would give them, perpetual access, a whole bunch of things.
Eiso Kant [01:21:49]: And, we found out the name was trademarked, Snowball Labs, while we were at that conference and having this discussion that we had no right to have, right? We were a couple of guys who had nothing yet, but this big company was willing to entertain the fact that we might partner with them. And, we were discussing this, and it was in their annual conference in a public setting, and the chief scientist of that company said, “People can hear us here. Like, we should move somewhere else. Let’s go to the restaurant Poolside.” And for some reason, me and Jason looked at each other in that moment and said, “Oh.” and then later that night, - the name stuck with us. The word stuck with us, and we said, “Let’s call the company Poolside.” And ever since, we never ended up doing that deal, and we used it as a reminder to never turn down our, round down our ambitions, because that would’ve been the easy path. and the hard part was what we did, which is start and try to raise exorbitant amounts of money when you’re just a couple of guys who are not even building it in Silicon Valley, who don’t come from any, of the known knobs and things like this. And so everyone assumes Poolside because AGI, everyone sits Poolside, and it was a playful name, and we liked it, and it was a little bit different. But the name is, like, a reminder for us to never round down our ambitions, and whenever you’re faced with those decisions to just pick the harder path.
Swyx [01:23:09]: Yeah. I mean, that’s a great story. I know you’ve told it before
Eiso Kant [01:23:13]: Yeah
Swyx [01:23:13]: But I just wanted
Eiso Kant [01:23:14]: Right
Swyx [01:23:14]: On the record. but that’s, that’s what I did the first time I met you. You told me, you sat me down. You were, you, we were in the hotel somewhere.
Eiso Kant [01:23:21]: Yeah.
Swyx [01:23:21]: And you were like, “We’re raising a $500 million.” I’m like. And then you gave me the whole vision, and then you did it. And I was like, well, it’s, I don’t have that much opportunities to ask, like, just how do you do that raise to that to those kinds of VCs? What are they looking for? like, yes, vaguely AGI, but, like, what do they want when
Raising Huge Rounds and the AGI Investment Thesis
Eiso Kant [01:23:42]: Look, it’s, the world’s definitely changed, right? When we were raising that $500 million round, the majority of investor conversations were still trying to explain that these models were not just stochastic parrots and that they were gonna keep going. I’ve seen the world go from OpenAI is gonna win it all and there’s no one else who can build company, right? I mean, Anthropic struggled, to raise their $500 million round. That’s like, well reported. They pulled it off, gladly. and so I think when we raised that, it was about a year and a half ago at this point, the world was very different than it is today. I think the world today, There’s been, there’s been this function where the number of people who believe AGI is real, Is probably a, an, a super linear or definitely some form of an exponential function itself.
Eiso Kant [01:24:31]: And I think this is important because if you hold the belief that we had three years ago and a year and a half ago, and we looked for people who shared that belief, which is like, this technology is gonna fundamentally underpin everything that’s economically interest- or economically valuable and scientifically interesting for, like, the future, then the value function afterwards is easy to understand, which is like, hey, if you get there, you are one of the commodity, one of the players who can build this commodity. and over the years, building that commodity has become not just about building models, but also about building infrastructure and other things.
Eiso Kant [01:25:03]: And so I think today, because the number of people is bigger and the outcomes have been proven, right? I think the incredible, like, financial success that Anthropic is having right now and, like, the growth that OpenAI’s had and others and Google no longer make this a question of is there product market fit, which really a couple of years ago was, like, part of the question. Like, how big can these things be? You tell people that, like, you’d be at these amount of revenue numbers in our industry right now, people were still, like, would laugh you out the room.
Eiso Kant [01:25:33]: Now I think it’s a function of who in the world believes that it’s gonna be an oligopoly of intelligence And who believes that oligopoly can be broken by other companies. And I think that’s what divides investors more than anything else. For the ones who believe in AGI, and then you’ve got a whole layer that, is self-selecting out, foundation model companies because they’re like, “Look, I can’t make - The money I put there, compared to what I can put in an application company is very different.” I think there’s incredible application companies, and there should be many should be built. But I do think we are still in a world right now where this is the early innings - this can still be the early innings of who is going to, be part of the set of people who win. This - Intelligence is the most, in my view, gonna be the world’s most demanded commodity. It will more commoditize in margin and price. and the world wants choice and wants options. And so I think treating the world as like, “Oh, there’s only gonna be two players,” I think is very shortsighted from investors.
Eiso Kant [01:26:41]: I think that group who thought that was a lot bigger at the beginning of the year than now.
Eiso Kant [01:26:46]: I think the last couple of months have woken up a lot of people and going, “Holy s**t,” like, the world both can use a lot more intelligence, but also, like, the world is far more complex. We should have multiple choices, more options, things that can be turned off, that can’t be, that. The restrictions that people put on models now, I think, is another area of this, right?
Eiso Kant [01:27:08]: Like, the fact that We are entering into a world where model companies are saying, “You’re not allowed to use me for foundation model company development.” They should be allowed to do this. It’s capitalism. It’s their business. It’s their work product.
Eiso Kant [01:27:23]: But it is insane.
Eiso Kant [01:27:25]: It is wild that we are, like, okay with that.
Open Models, Democracy, and Regulation
Swyx [01:27:30]: Do you have more problem with Anthropic saying it or the White House saying it? that-- that you’re picking Two different
Eiso Kant [01:27:37]: Things
Swyx [01:27:37]: Limitations and restrictions there.
Eiso Kant [01:27:39]: Look, I think I, - I’ll put it this way. I think we wanna, as this technology gets more capable, for the better and worse, we do wanna yield to democracy to figure this out more and more. I think any single company making unilateral decisions, is, Is dangerous. It’s a concentration of power in a small number of people with very limited checks and balances. and that has never worked out well in history, in any way, shape, or form. and this is not a criticism on the existing foundation model companies. This is just more commentary on, like, how I’d like the world to be. I think in a world where the technology gets more capable, government needs to play an active role in determining, where is there real risks of misuse, right? And I do think we need to separate safety between misuse, and, doomsday scenarios that, I think No one knows if gonna, are gonna happen or not. And I think just, like, very practically, I think, I’m glad to see there’s a lot of conversation now starting to happen again at the government level of trying to figure this out. and now what the final decisions are, maybe I’m happy about them, maybe I don’t, maybe I agree, maybe not. But ultimately, like, that’s democracy always, right? Like, at any given moment, I might not be perfectly happy with one or the other, but people chose to vote in someone to make those decisions. And so I think over the long run, over a 20-year time span, the world directionally goes correct and democracy does work. At least, what’s the famous quote of like it’s the worst of - It’s the best of all the worst systems or something like that.
Swyx [01:29:26]: It’s the worst form of, organization except for all the others that we’ve tried.
Eiso Kant [01:29:30]: Exactly. That’s the one.
Swyx [01:29:31]: You can always count on me for a Churchill quote ‘cause I’ve, studied Churchill a lot.
Eiso Kant [01:29:35]: I love that. and so that’s what I hope for. Now, I do think we are in a critical moment of time, and so speaking up for anyone is important. I think, researchers who are thinking about starting their own foundation model companies start. people who wanna share their opinion and be vocal, if that’s with their representatives or just out on X, like, do so.
Eiso Kant [01:29:57]: And but concretely to your point, I think we are not at a level of capability right now that we should start restricting, open models in any way, shape, or form. I think it will hurt innovation if we do so.
Swyx [01:30:14]: Is there a point at which you will change your opinion there?
Eiso Kant [01:30:17]: Yes. I mean, look, - And there has to be.
Swyx [01:30:19]: Yeah.
Eiso Kant [01:30:20]: Right? Like, you cannot. If you sit with a straight face and say, “This can be open forever in every way, shape, or form,” it is just as, I think, egregious as saying, the opposite of it all needs to be closed down right now. Like, I think at any ends of extremes of spectrums is where we go wrong.
Eiso Kant [01:30:41]: Right? In society in any way, shape, or form. And so the answer is always more nuanced, and the answer is never black and white. And so I think as we encounter, like, real world scenarios where we have to say, “Hey, we have to be more careful,” we need to reevaluate. If that means training a model differently and opening it up, having different versions, some things that, That are restrict-- I think that’s totally okay because I don’t think anyone should be irresponsible. What I do wanna call out is that people have been calling for the fear of misuse of these models since 2, Right? And I still remember, like, “We cannot release 2 because the whole world will get “
Swyx [01:31:20]: I mean, that was Dario.
Eiso Kant [01:31:21]: And so, like, this is not a commentary on Dario, it’s a commentary just in general in the space. And so We have not been very good at this so far, and we need to get better at it. And I do think that the work that’s happening with, like, safety institutes and better evals and things like that is probably the right direction.
Swyx [01:31:38]: Yeah. I mean, I wanna say something in defense of this. It’s better to err on the side of safety and then roll it back rather than the other way because the other way, it’s a one, way decision. I think that’s, I think that’s true.
Vibhu [01:31:53]: The caveat there is also the competition, right? You don’t have global error on the side of safety, right? You’re talking
Swyx [01:32:01]: Yeah, exactly.
Vibhu [01:32:02]: So Oh, yeah
Swyx [01:32:02]: You don’t get to do unilateral safety because someone else will just be more unsafe than you.
Vibhu [01:32:06]: Yeah, exactly.
Swyx [01:32:07]: Yeah.
Vibhu [01:32:07]: You can pause innovation here. It doesn’t mean it’s, it’s pausing elsewhere.
Swyx [01:32:11]: They’ll just take over the world. It’s so easy.
Eiso Kant [01:32:13]: They’re, they’re complex parts.
Swyx [01:32:14]: Yeah.
Eiso Kant [01:32:15]: Right? And I think we are much better off talking about certain capabilities that we can, commonly agree on and internationally agree on that we want to, limit or not have available, than we should talk about it in black and white of models available, yes or no. Like, the moment you start getting these big blanket statements, it’that’s when you start getting at the risk of, like. I always think back about when we banned advertising on cigarettes. Good thing. I’m not saying I’m against that. But it effectively established an oligopoly of cigarette companies because no one else could ever compete. and it was the, probably the best moment to the tobacco industry that ever happened, And we don’t wanna do that right now. If we pull up, walls behind innovation, and this is a self-serving comment because I’m not at the frontier yet, but it’s not just related to me. I think it’s related to everyone in the space. you are deciding right now in 2026, based on the current capabilities of models, that this is something that only two or three companies can build, and that to me reads like chapter 14 of the most dystopian fi novel that I could read because from there I think you can play out all the scenarios that happen in the world, and none of those are the ones that make me, excited about the future. and I think that’s the thing we should all think about. Like, what’s the future we wanna be excited about? What do we wanna have? And I think that’s a future where intelligence is a commodity. Everyone can access it. It becomes cheaper and cheaper, right? I think that’s important. It can, like, impact more of the world, and it’s not one where, a single company puts their thumb on their scale of both what it outputs, to or turns it on or off.
Nvidia, Hardware, and the Compute Stack
Swyx [01:33:56]: I think the one entity that has more power than the US government here is Nvidia.
Swyx [01:34:02]: Because, like, whoever gets the allocations gets the compute.
Vibhu [01:34:06]: You can take it down to TSMC or,
Swyx [01:34:09]: And TSMC below that. But I just wanna test provocative statements to see if you have any response.
Eiso Kant [01:34:18]: I need to think on that one.
Vibhu [01:34:20]: Which I think they are regulated, right? Like, you can see the government
Swyx [01:34:23]: Nvidia’s not regulated.
Vibhu [01:34:24]: Can they ship to China?
Swyx [01:34:26]: Okay, but they’re not China.
Eiso Kant [01:34:30]: Look, I think this industry Has existed because of what Nvidia’s done.
Swyx [01:34:35]: Yeah.
Eiso Kant [01:34:35]: Right? I know they-- - People like it’s easy to give them flack, but I also wanna say, like, I remember when we started Source, right? In 2015 post that capacity article. It was able for this progress to happen because we were able to put consumer GPUs in servers, and they allowed us to do so, and then, like, and you kept going further. And so this is something, like, foundation models are so closely linked to their hardware and their systems.
Swyx [01:34:58]: Yeah.
Eiso Kant [01:34:59]: Why do we see these stepwise progress happening? We see them happening because of the next generation of networking and systems that come out, right? The difference of a model you could train on Hoppers versus GB300s is the difference between a trillion-parameter model and a five or six trillion-parameter model. And so these things really coexist, I think, very closely to each other, and I think the more interesting question, I think, for the future is going to become of, like, how do - what can we unlock in terms of model capabilities, like, as we start designing these things even more? And we’re seeing that with, like, the next generation of systems. And I think the world, abhors.
Eiso Kant [01:35:42]: Like, capitalism does a really good job at trying to, like, push towards things that - that allow for more competition, right? And Nvidia allows for competition. It’s not. But if a government says no one else can build foundation models effectively through the regulation, that is very different. Now, is it hard to go build an Nvidia? Absolutely. Is it hard to build a foundation model? I think it’s very hard to build a foundation model. But we should, like, make the playing field one that where, if someone wakes up tomorrow and wants to do so, they are, like, allowed to do so, and they’re allowed to use the tools to do so. And I think there’s still a big difference between what we’re seeing in the discussions around model companies versus what we’re seeing with chip companies.
Vibhu [01:36:25]: The gap also seems to be the expertise in who regulates it, right? Who at the government decides what’s too safe, too smart, too dangerous? but while we’re throwing spicy questions out there, do you have anything that comes to top of mind that could be changed? So, should OpenAI, Anthropic, open source models? Is it open weights? Is it what we do in RL that determines, your safety barriers? Is there anything that should be done there or just spitballing?
RL Bottlenecks, Mixed Hardware, and Low-Precision RL
Eiso Kant [01:36:53]: That’s a good question. yes. one of the things that I’m excited about that I think we’re more and more talking about, I don’t think anyone is doing yet, is, mix and match of hardware during RL training, right? Like, - You think about, like, the notion, and we’re seeing this in inference, right? The prefill and decode
Vibhu [01:37:15]: Yeah
Eiso Kant [01:37:16]: Just work better with, a general purpose, GPU and a more specialized, like, chip, right? Like, if the Groq chip at Nvidia, the LPU and the GPU combined, and there’s different versions of that in the industry. And RL is batch size constrained, Right? So, like, you are ultimately-- and then you’re batch size constrained because you don’t have infinite tasks, right? When you’ve got the entire web, you can be much more flexible in scaling up your batch size because you’ve got the entire web. But for RL, you have, X millions of tasks that you are gonna be training on, and so you cannot blow up your batch size massively, which means that you can’t scale compute to a certain extent with RL the same way you could scale compute with, like, training. and so I’m very excited about anything that improves that. And I think one of the best ways to start improving that is the things that we’re already starting to see in inference, which is the separation of the prefill and decode to different chips to come to reinforcement learning, right? and I think we’ll be there soon. and I think more people should be working on this, because then all of a sudden we’re able to just be way more efficient in how we train RL from a wall clock time. Again, coming back down to the fact that it’s a race, right? The race is measured not in how many GPUs, but the race is measured on calendar time, and that’s probably one of the biggest impacts we can have right now to speed up our industry. and so that’s one, like, technically I love geeking out about and talking to people. Yeah.
Swyx [01:38:45]: Yeah, I would talk to Etched. I had a tour of their data center and, physically you can see how PD disaggregation is mapped out in the data center, and you have to own your own hardware to do that.
Eiso Kant [01:38:57]: Yeah. No, look, I think it’- I think more innovation in the space is just, like, is the coolest thing.
Swyx [01:39:02]: Yeah.
Eiso Kant [01:39:03]: And so I’m, I’m excited because that’s like, all of us are.
Eiso Kant [01:39:09]: Like, why don’t we finish, post-training this model, whatever, two weeks before release? Or no, sorry, between release, between training, then, training SFT, and then the time it takes for release. My biggest wall clock bottleneck right now is RL time.
Eiso Kant [01:39:25]: Right? And it’s just because I can’t scale it up further because I can’t add more GPUs to it because of that batch size constraint. There’s a really cool, blog post that just came out that was showing, RL done in even lower precision than any of us are doing. I thought this was really cool. So just what date is it today? We’re on July 15, so this came out five days ago. and I thought this was very cool. I think, lower precision RL, while keeping it stable, we’re, we’re still doing this in FP8, and so, I was excited to see them sharing this work and bringing it out. it’s definitely something that I’m excited to be doing once we move to Blackwell GPUs.
Swyx [01:40:05]: But yeah, cool. Part of open research, you take and you give.
Eiso Kant [01:40:08]: Exactly. Yeah.
Swyx [01:40:10]: I’ll just quickly mention, there was a paper that did a ablation on, levels of quantization, and they roughly concluded that four bit was the sweet spot. But I don’t remember
Eiso Kant [01:40:20]: This was just a couple of years ago, right? I think I remember this.
Swyx [01:40:22]: I think one year.
Eiso Kant [01:40:23]: One year, okay.
Swyx [01:40:24]: But like, I’m like, okay, maybe NVFP4 is it. You can’t really-- Like, the lowest you can go is ternary.
Eiso Kant [01:40:30]: Yeah.
Swyx [01:40:30]: That’s it. Like, there’s not that many.
Eiso Kant [01:40:32]: Well, I mean, there’s, there’s, there’s still quite a difference between NVFP4 and four bit, right, in terms of what’s, what’s possible. But I think NVFP4 is, underrated in terms of what it is. I’m, I’m quite excited that - when it came out, it’s, just getting that extra, like, that trade-off between range,
Swyx [01:40:51]: Yeah
Eiso Kant [01:40:51]: Is very cool.
Swyx [01:40:52]: A couple quick closing questions.
Vibhu [01:40:54]: I have a quick one.
XS, S, Distillation, and Model Cadence
Swyx [01:40:55]: Yeah.
Vibhu [01:40:55]: Okay, quick question back to technical side. So any big takeaways from XS 2.1 medium to training the new small, just general in terms of training models? You mentioned a lot in the earlier discussion about, okay, in training, there’s a lot you can squeeze out, right? You can learn a lot more from the web. at the same time, you took 30B and scaled it up to 120B, right? is there any gating on how small is too small? So I’m, I’m just gonna ramble for a bit. I’ll come to a question at the end. But, part of Carpathy’s thesis was cognitive core, right? We’ve seen Vipe Thinker, Nanbase, 3B, 4Bs that reason a lot, and then, the idea is you offload to a different model for the work. This, these are small reasoning models. So have you found anything interesting in model sizes, like 20, 30Bs on device, 100Bs on single GPU? can you squeeze out more there?
Eiso Kant [01:41:56]: There’s a lot more to squeeze out. like, I think, not to make too many forward promises, but I think we can squeeze a lot more out of the XS size as well. and I think we learned a lot during S training that will allow us to improve XS, like, size even further. And I think already since then we have learned things that could have made S even better. I think there is a lot more still for, like, our space to squeeze out of models much smaller. I don’t think that’s an argument against scaling. It’s just an, And one, by the way, and I think this is a nice thing that, it’s really-- it’s not very helpful to have, a post-training recipe for a smaller model and try to apply it to a bigger model.
Vibhu [01:42:38]: Yeah.
Eiso Kant [01:42:38]: It just, in all cases, you’re gonna have to rethink most of the recipe. But, recipe for post-training for a bigger model applied to a smaller model is almost always just a really good, like, improvement and baseline. You can still tweak it more, but I don’t think that’s necessarily, like, obvious. and so - once you make your bigger models better, you often have a quick lever to quickly improve your smaller models again. but will we be able to squeeze a lot more out of smaller models? Laguna S gave me a lot of confidence that I think we can. and I think it’s around that discussion we had earlier about that it’s about the behaviors, not necessarily the raw intelligence, that you’re trying to improve the models for.
Vibhu [01:43:23]: And that’s on all axes of, There’s like an axis of how long a model will reason, so how long can it stay agentic, then there’s also efficiency, right? You wanna ideally push on both. And the thing to clarify you guys aren’t doing right now, which we do see at Frontier Labs, is the distillation, right? You have a big model that you don’t really ship to users, and what you put out for inference is typically distilled from that, which gets you quite a bit of gains, right?
Eiso Kant [01:43:50]: Look, I think it’s, it’s something we don’t do right now because of, like, why we’re also, like, building these models, right? These models are for us part of our research path. So we’ve, Laguna Medium was much larger than the last two models that, this one and last one that we’ve released and we’ve trained even bigger models in the past. So there is the engineering component of, like, a bigger model and every order of magnitude size, you’ll learn new things in training about stability. But at smaller model sizes, you are able to just iterate a lot quicker, like internally, right, on your research. And so, for us, distilling down to a smaller model doesn’t serve the purpose. These models are. It’s not the right term, but to us they’re dual purpose models. They are progress for us to weigh to see did we improve in the model factory and something to put out into the world. and so that’s why we don’t do it. We’ve done distillation experiments, and there’s, like, really cool things you can do, and I think if you have lots of user data, then, you can go even further, right, in that. But I think there’s something to be said in having a quick cadence of models trained end from scratch so that you as a research organization can learn the lessons and not wait. That was one of the big lessons we learned over the years when we used to have a much
Eiso Kant [01:45:09]: Longer cadence between model trainings, like six months, and we would train just, like, a big model, wait six months, train another bigger model. you would be compounding so many changes of improvements That at the by the time you’re training your next model, it’s a bit of a soup, and you don’t really know what ingredients led to the outcomes. So when you are training far more frequently models, and this holds true for both post-training, and from training from scratch, you are much more able to get an understanding of what led to the improvements. and I think that’s important. Like, ultimately, we are all still. There is no true science yet of, deep learning for large language models. but we are all, I think, trying to gain insights from our experiments because it’s those insights that lead to scaling laws, that lead to the improvements that allow us to be, again, more compute efficient and get more capabilities.
Swyx [01:46:02]: Yeah. amazing. I was gonna end off with a little bit more history. you spent some time looking at, metrics for engineering team productivity. How do you think about engineering team productivity today?
Engineering Productivity in the Agent Era
Eiso Kant [01:46:14]: I mean, it’s wild, right? I mean, it’s the, it’s like the golden age. Like, it’s the fact that you can just take an idea and build something by waiting overnight for an agent to do the work.
Eiso Kant [01:46:26]: I don’t know. To
Swyx [01:46:27]: Like, how do you measure when.
Swyx [01:46:28]: ‘cause you literally in a theory
Eiso Kant [01:46:30]: Yeah.
Swyx [01:46:30]: You’re doing this, right?
Eiso Kant [01:46:32]: Look, I think It’s a good question. It’s one I haven’t thought about in a long time.
Swyx [01:46:36]: But, you’re qual- you’re pretty qualified to do it.
Eiso Kant [01:46:38]: No, I’m gonna. - No, it’s a fair point. Let me take a second to think about it. Look, ultimately, what is code, what is software, what is engineering is to go from something that is valuable for an end user or sets of end users, like an idea, an extra, a bug fix, a feature, to, like, delivering that value. And I think what we’re doing with these models becoming more capable is that we are massively like, both cutting out middlemen and compressing the time that it takes to deliver that value. And ultimately, that iteration cycle for any startup or any company is what allows you to win, right? If you’re able to solve a bug in two hours versus it staying in the back log for three weeks, if you’re able to, like, be on a customer call and learn, hey, if this feature existed, it would, like, they’d be willing to pay more, and it’s more valuable to them, and you ship it in a week instead of in a month. And so I think ultimately, maybe the same things that we looked at years ago LLM still apply, and it’s just the notion of cycle time. But in this case, it’s lead time from the moment you have a valuable thing that you are looking to do for someone to the moment that it’s shipped to them. Every other metric is ultimately a leading indicator for that lagging indicator, right? It doesn’t matter if you’re looking at amounts of code, PR, reviews, all of these things. And so I think in this case, we are starting to move so quickly in some of these things that we can just sit back and look at what was traditionally the lagging indicator. We just named it the lead time from traditionally ticket to, like, an end result. what I would look at in this new world, that maybe we didn’t think about before is how much can a single person do with that,
Eiso Kant [01:48:22]: Right? One of the most, like, if you look at AI native companies, they’re not designed like the engineering orgs of, LLM age. They’re designed with often just the builder, right? and as close to the customer to the ability to ship. there isn’t necessarily a huge team in between that sits there. And I think that is, I think, is exciting, like organizations where a single IC can just, get much closer to that. So I would look at From where the value sits that’s identified to the moment it’s shipped and how many people are involved in that. And you want the amount of people involved in that to be less, and you want the time end to be shorter.
Swyx [01:49:05]: Okay. is there a way to eval that when you’re, interviewing somebody?
Eiso Kant [01:49:12]: Oof.
Swyx [01:49:13]: ‘Cause that is,
Eiso Kant [01:49:14]: Look,
Swyx [01:49:14]: The most compressed version.
Agency, Constraints, and High-Impact Teams
Eiso Kant [01:49:17]: I think the common answer to this is agency.
Swyx [01:49:20]: Yeah.
Eiso Kant [01:49:20]: How much agency does a person have? I think in the age of AI getting more capable, agency becomes probably one of the most important qualities for anyone. and I think agency is something you can look for in, what people have done in the past because agency is something that if you have it, you are demonstrating it, right? No one has just agency and is sitting back and not, like, exercising it. The whole definition of it is that it’s exercised. And so understanding, like, what were things that people did in their lives, in their professional and their personal projects that showed agency and, your personal backstory shows a ridiculous amount of agency.
Swyx [01:49:56]: Oh, dear.
Eiso Kant [01:49:58]: Like, I think that is ultimately it. It’s the Silicon Valley, quota the, of the last, year and a half or so is like you can just do things, right?
Swyx [01:50:06]: Yeah.
Eiso Kant [01:50:07]: That- that’s I think what you’re looking for.
Swyx [01:50:08]: I think then aligning high agency people is very hard because they all wanna go their own way. That’s the whole point, right?
Eiso Kant [01:50:15]: They-- Yeah, but I think the notion - Like, I think the notion of a good leader, right, in an organization is to be able to bring people together around, like, a common outcome. And I think what you wanna do with anyone who’s high agency-- I feel very lucky I’ve got an organization with incredibly high agency people. Like, I mean, I’m not the one who built the model, right? I cannot stress this enough. Like, it’s the team that, like, achieved this, and it’s a team that is incredibly high agency. And so if you look at, like, what does it take to bring that together, it’s, it’s ultimately a common goal and a common set of boundaries. Because if you allow to just go, “You can do everything,” you become an exploration algorithm. And this is what we see in big tech, right? In research, in big tech, everything is an exploration algorithm. Everyone can do anything as long as - And then it becomes political about gathering the resources. So when you say, “This is our common goal, and these are the boundaries that we’ve set,” right? “We’re not multimodal. We focus on RL.” Like, we do these things, and you’re upfront with people before they join the company, you get a lot of agency. You can run where you want, but these are the places where we
Swyx [01:51:24]: Yeah, lanes
Eiso Kant [01:51:24]: This doesn’t make-- This is the lanes
Swyx [01:51:25]: Yeah
Eiso Kant [01:51:25]: That makes sense. I think it gets the best out of people because, like, innovation comes from constraints.
Eiso Kant [01:51:34]: We did this with relatively little compute and relatively little money compared to some of, like, the others that are out there. and I’ve thought back on that quite a bit recently and thought, it was a good thing Because those constraints forced us to become much better in certain other axes that might-- others might have not, right? We purchased relatively little external data.
Swyx [01:52:01]: I was gonna ask about that. Yeah.
Eiso Kant [01:52:02]: Exactly, right. That was a constraint. but it’s a constraint that pushed us to move on other areas to improve. And like, and there’s lots of versions of that. So I think high agency people, you wanna empower, you wanna get them really excited about what they’re doing, but you also wanna say, “Hey, if you join this mission, this is the outcome I need you to achieve. But these are the places that we don’t go, and maybe if you care about those places, go somewhere else.”
Swyx [01:52:26]: Yeah. Great. last call to action, who are you hiring?
Hiring, Impact, and Closing
Eiso Kant [01:52:31]: We are hiring on every possible role in applied research and engineering in the company. so from
Swyx [01:52:36]: Yeah
Eiso Kant [01:52:36]: Training all the way to evals to post-training architecture. Like, we are still in a world where, individuals can have massive impact. And I think our pitch to join us, it’- We spoke a lot about the mission, how we think about things, but I think we are one of the places where it’s the highest ratio to individual to impact, Right? Less than 70 people built this model. Less than 115 between engineering and researchers, like, together did this effort, and that’s a very broad definition ‘cause I put myself in the 115 list.
Eiso Kant [01:53:08]: And so being able to do this work on a mission that you’re aligned with, and you can have that - every individual still has huge impact. And I think
Swyx [01:53:18]: And being able to publish, being able to open
Eiso Kant [01:53:20]: It’
Swyx [01:53:20]: Open source the model.
Eiso Kant [01:53:21]: Yeah, look, all of those things are part of that. But I think ultimately, when you can today pick between joining a very large foundation model company But you are one of many.
Eiso Kant [01:53:35]: And not by any fault of them, but just by definition, the denominator has become really big. And our denominator is quite small, and so the level of impact you get to have is really high. And I think ultimately all of us, the most incredible high agency people I know, what are they optimizing for? They’re optimizing for impact. they’re optimizing for impact, and am I aligned with the mission? And if today you heard about the mission and aligned and you’re optimizing for impact, I think we’re a really good place to join.
Swyx [01:54:05]: Okay.
Eiso Kant [01:54:05]: Awesome.
Swyx [01:54:05]: I think we end it there. That’s a fantastic statement. You did amazing on four hours of sleep.
Eiso Kant [01:54:11]: Thank you, guys.
Swyx [01:54:12]: So, podcast eval, definitely approved.
Eiso Kant [01:54:14]: Appreciate it. I literally wrote it down. My eyes are, like, starting to go like this. I’m like, “Phew.”
Swyx [01:54:17]: We’ll let you go. We’ll let you go back.
Eiso Kant [01:54:19]: It was good to see you guys.
Swyx [01:54:19]: Thank you for setting this up. We wanted to get this in because we think it’s a great model.
Eiso Kant [01:54:23]: Appreciate it.
Swyx [01:54:23]: I think a great story to tell. Thank you.

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
🔬Causal Models Need Causal Data - Xaira’s X-Cell model for Drug Discovery (Bo Wang & Ci Chu, Chief Discovery Officer & Chief AI Scientist)
07/21/2026 | 1h 29 mins.
Bet on information
If test loss flatlines after 1.5B parameters while training loss continues to drop as you scale, that tells you that your model is limited by the amount of information in your data.

Training on a single, smallish data set exposed an information gap: the 3.1B model falls off the scaling trend. Neither parameters nor compute will improve performance past this wall. For predicting changes to gene expression, you need more information rich data.
This is what Chu and Bo’s teams have done, and here is what ~30x the information buys you:

Now we can scale with parameters and training compute! We don’t know how much this effort costed, but we can guess that data collection experiments and infrastructure was a few tens of millions, and compute + headcount + research was a few million. The budget looks like a RL rollout budget, rather than a data rich pre-training one.
We were lucky enough to have the two central figures in this story on our podcast. Taking the lead from Ci Chu and Bo Wang, Xaira Therapeutics is betting that information rich data is the key to AI-driven drug development. Chu was recently promoted to Chief Discovery Officer and Bo to Chief AI Scientist, underscoring just how strategic Xaira considers this bet.
Reverse engineering the human cell
If you had to figure out how a human cell works, what would you do? A good place to start might be by documenting what genes are expressed (e.g. what RNA is floating around) in different kinds of cells, in different circumstances.
That is CELLxGENE, a database of 168M cells built by Chan Zuckerberg Institute that maps each cell to a count of how many times 20K-30K genes were detected in that cell, plus detailed metadata about every cell. A ~4 trillion-entry matrix.
If the Protein Data Bank (PDB) unlocked structural biology models (Boltz Episode, ESM/BioHub Episode), CELLxGENE has done the same thing for Virtual Cell models. Like PDB, CELLxGENE has inspired a zoo of AI models of RNA expression; so much so that RNA expression models have become synonymous with Virtual Cell models. Bo Wang built one of the most influential, scGPT, that became the starting point for Xaira’s new model.
RNA expression ≠ Virtual Cell
Models trained on CELLxGENE describe the relationship between cell types and cell states, but they are not good at predicting what will happen if we make changes to RNA expression. Changes in gene expression are highly correlated, and its is difficult (impossible) to figure out what causes what in most cases.
If you could “turn the dial down” on one gene at a time, however, then you would be able to observe what is upstream and downstream of a given gene. You could tell if A → B & C or B → A & C or B → A, C → B → … If you did this for all of the genes, then maybe you could train a model that could predict what would happen to a cell if you change a gene (e.g. with a drug or a gene edit). Or maybe you could figure out the least invasive way to change a particular gene’s expression.
X-Atlas → X-Cell
This is exactly what Chu and Bo’s teams have done. The data set is called X-Atlas and the model is called X-Cell.
In this episode, we discuss:
* Why the team abandoned autoregression for diffusion
* The CRISPR-based experiments that run millions of tests in parallel, and generate the raw data for X-Atlas and X-cell
* Generalization to real lab experiments in real human cells
* Beating the linear baseline that has outperformed previous models
* Justifying a kitchen-sink of priors, and how that stacks up vs. data and architecture
Bo also shared with us some of the (major) advantages he has as an academic vs. industry leader, and how his labs keep up with the breakneck pace of AI innovation.
Check out the full episode on YouTube, or your favorite podcasting platform!

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
🔬 The Lab of the Future Should Feel Like a Data Center — Andy Beam & Rafa Gómez-Bombarelli, Lila Sciences
07/16/2026 | 1h 41 mins.
Imagine a dark warehouse. Racks and racks of devices with wires, tubes, and electronics sticking out. The next AI data center? No. This is Lila Sciences‘ dream for the future of science. A dark warehouse full of AI-guided robotics and lab equipment, cranking out new experiments 24/7, building toward a scientific superintelligence.
Their automated lab is almost hypnotizing to watch. They have floating plates zipping around on Wall-E-esque tracks, used vision-language models to control Windows 95 boxes, and created the world’s largest collection of voided warranties. In the process they’ve built a massive library of scientific reasoning tokens. Over 10 trillion of them, all experimentally validated.
No warranties were voided in the making of this video
To say Lila is ambitious is an understatement. Their goal is a scientific superintelligence wired directly into the wet lab. They are all in on the bitter lesson, and the thesis follows from it: a lab is an infinite token generator. Produce data at scale, and the synergies give you a general reasoner that can tackle any scientific problem. They are committing hard. Biology, chemistry, drug discovery, and materials science, all at the same time. Time will tell if it works, but it is an exciting hypothesis.
In our latest episode we sat down with Lila’s very own Andy Beam (CTO) and Rafa Gómez-Bombarelli (CSO, physical sciences) and went on a journey through the possibilities of AI-run science, almost as wide-ranging as Lila’s goals.
Did we mention they do both materials science and biology? In the same AI science factory? Same time, same lab, same AI. Finally a guest who can settle a long-running debate we’ve had amongst ourselves: is biology or materials science harder?
Watch to find out!
We discuss:
* The internet is spent, science is next. Why Lila thinks the scientific method is the last untapped internet-scale dataset, and why they treat RL as a data generation mechanism with nature as the verifier.
* The lab as a data center. Instruments as nodes on a graph, a magnetically levitating “PCI bus” transport layer between them, orchestration as a slurm queue. Andy is not short on analogies.
* Why Lila insists it is not an automation company. They optimize for flexibility and generalizability over raw throughput, which means humans stay below the API line wherever automating does not pay.
* Your experiment has a runtime. We put Escalante Bio’s question to Andy: if science is the token generator, what is the runtime of your data collection? His answer, in short, is that you cannot make the ribosome go faster. Why Lila bets on fast round-over-round iteration rather than big noisy multiplexed screens, and how Rafa’s team rebuilt a gas sorption measurement to run roughly 2,500x faster.
* What is actually in 10 trillion scientific tokens. Not sequences. Experimentally verified reasoning traces, a kind of data that Andy argues exists on the internet in quantities that round to zero.
* Breadth as a path to depth. Small molecule chemistry priors transferring to metal organic frameworks for carbon capture, and the claim that the general model beats domain-specific models sample for sample.
* If you have the data, what do you need the model for? Sri Kosuri’s koan about the ML-for-drug-discovery business model, and Andy’s answer: the coding model got better because it also read Shakespeare and carnitas recipes.
* The serendipity they want to automate. Emily Whitehead survived the first pediatric CAR-T cure only because the doctor treating her happened to know, from pediatric arthritis, which antibody would blunt her IL-6 response. Roll that dice again and you probably lose her. Breadth is how you stop depending on luck.
* Move 37 for catalysts. Model suggestions for platinum-group-free electrocatalysts that went from boring, to what a 40-paper expert called stupid, to the best performers they have made.
* Six months to in vivo CAR-T data in non-human primates, and the zero-FTE virtual startup commercial model that fell out of it. For context on why that number is startling, AbbVie paid $2.1B for Capstan on the strength of preclinical in vivo CAR-T data.
* You cannot have scientific superintelligence if you are just a good test taker. Ken Stanley, who wrote Why Greatness Cannot Be Planned, runs open-endedness at Lila. RL at scale gives you a ruthlessly Vulcan problem solver. Machine creativity is a different thing, and it is the part nobody has solved.
* The chain of thought is an unreliable narrator. The model reasons in latent space and only emits tokens. Sometimes it skips the experiment entirely and is still right. So how much do you trust the reasoning versus the verifier?
* Reward hacking when the rollout is physical. Chains of thought that collapse into repetition, and a model that got annoyed and swore at the scientist who kept asking it to redo a plate map. What happens when a pathological loop has a wet lab inside it?
* The bittersweet lesson. Rafa’s inversion of the bitter lesson: in AI, scaling is a roadmap. In materials, scaling is a filter, because only the things that scale end up mattering.
* Not your typical Flagship company. Why a famously single-asset biotech incubator spun out a platform bet, and Andy’s line that if Lila called itself a biopharma it would have a top-three GPU cluster.
* Bottlenecks they would remove by fiat. Sim-to-real for physics-based simulation, and the fact that RL training runs at roughly 5% mean FLOP utilization.
Watch on YouTube:

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO
07/08/2026 | 57 mins.
We’ve been running a bit of an Agent Cloud series surveying all the top inference/compute/cloud providers, from Databricks to Daytona to Railway and, even further back, E2B, but we’re excited to conclude this series returning to Modal, which has just raised a monster $355M Series C.
The cloud was built for developers. But agents are now changing that.
The old infra stack was designed for a human who could read docs, reason through YAML, and understand dashboards to figure out what they need when something broke. While this was painful for developers, it worked since they could fill in missing context in their heads.
However, agents don’t have that luxury. Now in this new era of agents, everything has to be tighter.
They need a place to write code, run it, inspect the output, change the environment, debug failures, and try again. Fast iteration and feedback loops with all the necessary context are crucial for agents to operate properly. Furthermore, sandboxes are a clear representation of this shift as agents can easily spin up isolated environments. This programmatic infra even extends to research:
Two years ago, we were one of the first to cover Modal with CEO Erik Bernhardsson and Alessio designed our favorite LS thumbnail of all time:
At the time, Modal was just a teeny little company with a $17M Series A.
Today, fresh off their $355M Series C, Modal is one of the clearest examples of the agent cloud future being built in real time: a cloud platform moving past traditional web app assumptions toward the workloads AI actually creates such as elastic inference, sandboxes, GPU burst, post-training, background agents, and infrastructure that agents themselves can operate.
In this episode, Modal CTO Akshat Bubna joins swyx and Vibhu to unpack why AI applications don’t fit traditional cloud assumptions, why Kubernetes was never designed for bursty compute-heavy workloads, and why Modal is now shifting from developer experience to agent experience.
We go deep on Modal’s AI infra stack: serverless functions, decorator-based infrastructure, elastic inference for custom models, GPU snapshotting, DeFlash, speculative decoding, Auto Endpoints, sandboxes, persistent storage, networked containers, private IPv6, RDMA, multi-node training, and Modal’s capacity pool across 17 cloud providers. Akshat also explains why RL rollouts can require 100,000 sandboxes, why production agents need hard guardrails, why observability may matter more than reading code, and why AI has made infrastructure exciting again.
We discuss:
* Why Kubernetes wasn’t built for bursty AI workloads
* How Modal started as a better runtime before becoming an AI cloud
* Why Modal added GPUs before ChatGPT
* The shift from developer experience to agent experience
* Why observability matters when agents are writing the code
* Elastic inference for custom models across audio, video, robotics, and comp bio
* GPU snapshotting, cold starts, and why inference workloads are so bursty
* Why RL rollouts can require 100,000 sandboxes
* DeFlash, speculative decoding, and frontier-level inference performance
* Auto Endpoints and making optimized inference easier to deploy
* What Modal adds beyond vLLM, SGLang, and raw GPU rental
* Modal’s 17-cloud capacity pool and supercloud strategy
* Networked sandboxes, sidecars, private IPv6, and RDMA
* Serverless multi-node training for post-training and research workloads
* Auto-research, model-guided sweeps, and agents launching GPU experiments
* Compute strategy, capacity planning, and batch tiers
* Why production agents need specialized sandboxes and hard guardrails
* Modal’s take on managed agents, CI, Gitpod/Ona, Python, TypeScript, and Modal Bench
Akshat Bubna
* LinkedIn: https://www.linkedin.com/in/akshat-bubna-188885103
* X: https://x.com/akshat_b
Modal
* Website: https://modal.com
Timestamps
00:00:00 Introduction
00:00:39 Modal’s origin and why Kubernetes wasn’t enough
00:04:32 Developer Experience → Agent Experience
00:06:21 Modal’s AI cloud primitives
00:09:14 Sandboxes, agent loops, and proto-Cognition
00:12:12 Elastic inference, GPU snapshotting, and 100,000 sandboxes
00:15:24 DeFlash, speculative decoding, and Auto Endpoints
00:19:59 Production-grade inference beyond raw GPUs
00:22:00 Background agents, Ramp Inspect, and the agent lifecycle
00:24:08 Modal’s 17-cloud supercloud strategy
00:26:40 Networked sandboxes, private IPv6, and RDMA
00:32:48 Multi-node training, post-training, and auto research
00:37:36 Compute strategy, capacity planning, and batch tiers
00:40:55 Open models, real-time AI, and production agent infra
00:43:06 Hard guardrails, managed agents, and specialized sandboxes
00:46:06 Why AI made infrastructure exciting again
00:48:30 Model APIs, differentiated products, and agentic video
00:51:50 CI, coding-agent infra, SDKs, and Modal Bench
00:57:28 Closing Thoughts
Transcript
Introduction: Modal, Series C, and the Art Party
Swyx [00:00:00]: We’re here with Akshat, CTO of Modal, together with Vibhu. Congrats on your Series C.
Akshat [00:00:10]: Thank you.
Swyx [00:00:11]: Your party yesterday was amazing.
Akshat [00:00:15]: Yeah.
Swyx [00:00:15]: From all the photos and all the swag.
Akshat [00:00:17]: We had a bunch of art installations, which was fun, seeing, like, our products on pedestals next to, like, Rodin.
Swyx [00:00:25]: Very nice. Very nice. When you started, it was not the GPU inference company. Maybe it was in your mind. Take us back to the origin story.
Modal’s Origin: A New Runtime Beyond Kubernetes
Akshat [00:00:39]: I first met Eric, who’s the CEO, through an investor. Back then Eric was already thinking about building, a new runtime, and he got there thinking through why are workflow orchestration products so hard to use. It’s because you have to run them on Kubernetes. Kubernetes is hard to manage. It’s not built for burstiness and, custom images,
Swyx [00:01:03]: Yeah
Akshat [00:01:03]: It has a terrible developer experience.
Swyx [00:01:05]: And I’ll, I’ll interject
Akshat [00:01:06]: Yeah
Swyx [00:01:07]: For listeners, who are new, we interviewed Eric two years ago, and there’s a bit more of the story there from Spotify and all those things.
Swyx [00:01:14]: And I came across Eric through Data Council because he did that talk on the serverless container stack that you guys did, which was like, that was my first like, “Okay, I need to take Modal very seriously” moment.
Akshat [00:01:26]: Yeah.
Swyx [00:01:26]: But it was still very unclear, like, do I need all this for just my data pipelines?
Akshat [00:01:33]: Yeah. initially what we were thinking about was if we build a better runtime, it’s a very useful primitive in itself. It’s There’s a lot of things that, get solved by serverless functions, like you can do, ETL stuff, you can do job queues, you can do all this, like, bursty processing, which it turns out every company had needs for. but then we also were thinking about this as like, this is a primitive that we can build a whole collection of products on, which are very verticalized. So perhaps data engineering would’ve been the first one, but we were thinking about inference. Back then it was more classical inference, like computer vision stuff and running XGBoosts and whatnot. But we added GPUs to the product a year before ChatGPT came out.
From Serverless Containers to GPU Workloads
Swyx [00:02:19]: Nice.
Akshat [00:02:19]: We just didn’t think it would be that big of a deal.
Swyx [00:02:22]: Yeah, just like add A100.
Vibhu [00:02:23]: Was there any, like, early key problem that really sparked off why you built it?
Akshat [00:02:28]: Yeah. Primarily it’s just, none of the tooling that was out there was built for, one, a really great developer experience, and also there’s a general trend of, a lot of the workloads that we were seeing were very. I wish there was a better word for it, but compute-heavy. Like, they need, one, like, need a lot more resources, so you need to burst up and down a lot, versus like Kubernetes designed for, like, slow scaling and, more for, like, web server use cases. And also there’s just a lot more specialization in, like, what kinds of environments these workloads run in. Like, we had sometimes they need accelerators, sometimes they need different kinds of images, and this is just like a consistent thing that we saw across a lot of companies. That would be the next step.
Software-Defined Infrastructure and Decorator-Based DX
Swyx [00:03:13]: Yeah. Yeah. Be nice. I don’t know how much this factored into the early story, but I wrote a post when I was at Temporal about infrastructure, software-defined infrastructure or something like that.
Akshat [00:03:22]: Yeah, the self-provisioning
Swyx [00:03:23]: Self-provisioning.
Akshat [00:03:24]: Yeah.
Swyx [00:03:24]: Yeah. I can’t even remember my own post.
Swyx [00:03:26]: And then you put me on the landing page.
Akshat [00:03:28]: Yeah. We really like, the term and so we stole it.
Swyx [00:03:32]: Because you had the insight that everything can just be in decorators co-located with the code, right?
Akshat [00:03:37]: Yeah.
Swyx [00:03:37]: Was that a big part of the original
Akshat [00:03:39]: Yes
Swyx [00:03:39]: Story or it was just like a DX layer?
Akshat [00:03:41]: That was, really important because we really didn’t want people to spend, so much time, writing YAML, and it seemed like you could really condense the surface area of what you’re doing, put it in code so you can operate on it just like you operate on other code, and like build stuff that’s more expressive and dynamic. and so yeah, that was always a very important part.
Swyx [00:04:04]: Then the pushback is this is a DSL.
Akshat [00:04:07]: Yeah.
Swyx [00:04:07]: It’s you’re closed source. I am locked into Modal.
Akshat [00:04:11]: Yeah. We never really got pushback for that because the nice thing about Modal is you can bring whatever code you have, and sure, the DSL is at the configuration layer for, what hardware you’re using, how you’re scaling things up, but you still own the code.
Akshat [00:04:27]: And that’s, that’s been an important, part of our story, even as we do inference now.
Swyx [00:04:32]: Yeah.
Vibhu [00:04:32]: How much of do you think still stays the same today? Like if you were to build something today, DevX very important, but I feel like, a lot of this has been changed with just hook it up to an agent, have Claude Code, have Codex implement a tool. there’s very agent native primitives that are different than if I’m doing this myself, right?
Developer Experience → Agent Experience
Akshat [00:04:54]: We’ve changed our SDK team to think about agent experience instead of, developer experience and we think that the same benefits that apply for DX also apply for AX, which is why would you have an agent read through hundreds of Kubernetes files and like write YAML that’s not even typed when it can make a couple of changes in a decorator and it gets this self-provisioning runtime of, being able to see its changes live in action? yeah, it just seems from the customers we talk to, they find Modal is much faster for agents to use versus operating on a different substrate.
Swyx [00:05:34]: Yeah, because like you, again, you co-locate the infrastructure requirements to the code that runs it.
Akshat [00:05:38]: Yeah.
Swyx [00:05:38]: Well, the negative thesis now is that nobody’s looking at their code anymore, so there’s no point.
Akshat [00:05:44]: Yeah, people aren’t looking at code. one thing we still see is really important is observability.
Swyx [00:05:51]: Yeah.
Akshat [00:05:51]: Like how good is your dashboard? And of course, like we have, we push a lot of it to the CLI so the agents can do their own investigation, but you still need humans to go interpret what’s going on and, make judgment calls and whatnot. and that’s I feel like, Maybe more important now than looking at the code itself.
Swyx [00:06:11]: Yes, because like, you can try to treat the code as a black box and then use, see the observable action that comes out of it, and then just prompt a change.
What Modal Is For: AI Cloud Primitives
Akshat [00:06:21]: Yeah.
Swyx [00:06:22]: So I think it takes a bit of restraint to not specialize, to say, “I want to ship a new primitive,” and then just be general purpose.
Swyx [00:06:31]: People ask you, “What are you for?” You’re like, “ I don’t know. We can do this, we can do that.”
Vibhu [00:06:36]: Well, I’d be curious to see, like, okay, if we were to ask you, like, what is Modal for even at a high level? There’s a lot you guys do, sandboxes, GPUs, everything. How do you answer?
Akshat [00:06:46]: Modal is a cloud platform that’s built for, where we’ve built the primitives from scratch for AI applications. and right now it covers, inference, training, batch processing, and sandbox workloads.
Akshat [00:07:00]: But we’re building a lot more
Swyx [00:07:02]: I noticed you didn’t say web server, so there is still a role for, like, the always-on large-scale Kubernetes type things.
Akshat [00:07:09]: Yeah, absolutely. We’re, we’re not trying to compete with the renders of the world, because yeah, we think the differentiator for us is the, are the workloads that need specialized compute, need to scale up and down a lot. yeah, they’re, they’re, they’re just shaped differently.
Working Alongside Frontier Startups
Vibhu [00:07:26]: I think you’re building a lot of it alongside the startups, right? They’re innovating quite a bit, even in your, like, latest blog post. Like, even in the series C, the customers that you mention here, the cognitions, technical ones, ramps and whatnot, they’re, they’re innovating with you, right? And that’s not something AWS is doing directly with.
Akshat [00:07:45]: Yeah, absolutely. I think, this is again classic. We’re a small team. We can move really fast. our engineers are working with our customers and figuring it out. Yeah.
Swyx [00:07:54]: So my first week at Cognition, I walked in, there was someone wearing a Modal shirt. I was like, “What are you doing here?” They’re like, “Yeah, I just. I am embedded inside of Cog.”
Akshat [00:08:05]: Yeah, I think that was Peyton. We sent him over
Swyx [00:08:07]: Yeah.
Akshat [00:08:07]: Because, the latency of communication was too high otherwise.
Swyx [00:08:12]: Yeah, distributed node, you have to - you have to place one and collocate.
Vibhu [00:08:16]: Yeah.
Swyx [00:08:16]: So I had a, I had direct personal experience, right? So I worked on smol developer three years ago. it was inspired by Claude 1. I think you onboarded me at some point, like, just before, and I was like, “Oh, like, I need some bursty compute. Like, I was just gonna try using Modal.” And it was a, it was a pretty pleasant experience. apparently, I showed up in the board meeting, like the analytics.
smol developer, Sandboxes, and Proto-Cognition
Akshat [00:08:39]: Yeah, you blew up on Hacker News and,
Swyx [00:08:41]: Yeah
Akshat [00:08:41]: We got a big traffic spike. I. I think the way you used smol developer was Modal functions for running stuff, which was. Like, the, that was a good use case. but then, yeah.
Swyx [00:08:53]: Yeah. That - So to me, that was proto-cognition.
Akshat [00:08:55]: Right.
Swyx [00:08:56]: If only I had, like, stuck to it.
Swyx [00:08:58]: Like, that was like, if - did you say draw the tech tree
Akshat [00:09:00]: Absolutely
Swyx [00:09:00]: You’re just like, “Yeah, like, probably this will happen.”
Akshat [00:09:02]: Yeah. Like, he was so close. You were just rebuilding upon us
Swyx [00:09:04]: I just didn’t realize.
Akshat [00:09:05]: But the funny story there is at the same time, we were talking to a bunch of customers who needed something like sandboxing.
Swyx [00:09:14]: Yeah.
Akshat [00:09:14]: This is like twenty-three.
Swyx [00:09:15]: Yeah.
Akshat [00:09:16]: So we built
Swyx [00:09:17]: You introduced a new API right after that.
Akshat [00:09:18]: Yeah.
Swyx [00:09:19]: Yes.
Akshat [00:09:19]: Like, we built sandboxes in May of twenty-three before anyone was even knew this was gonna be a thing. And the first example we published was, we took smol developer
Swyx [00:09:28]: Smol developer
Akshat [00:09:28]: And put it in a loop, so the agent can iterate on itself.
Swyx [00:09:33]: Loops are hot these days.
Vibhu [00:09:34]: It’s the looper.
Akshat [00:09:34]: Yeah.
Vibhu [00:09:35]: Loops in. When was this, twenty-three?
Akshat [00:09:38]: Yeah.
Vibhu [00:09:39]: A small check.
Akshat [00:09:39]: Yeah.
Swyx [00:09:39]: It’s like twenty-three. so the. the, those for listeners, like, the problem was the models are not built for any of this, right?
Swyx [00:09:46]: Like, you’re just trying to like. They’re not post-training to understand, like, looping and, like, self-correction and tool calling was there, but, like, also not that great.
Akshat [00:09:55]: Yeah.
Akshat [00:09:55]: I don’t remember if you used tool calling in this one, but yeah, the models would just diverge after like ten iterations and not produce anything meaningful.
Swyx [00:10:03]: Yeah. But like, then. So okay, like now talking to myself three years ago, the answer
Vibhu [00:10:08]: Of course they will get better
Swyx [00:10:09]: Collect all the failures, build benchmark, and then collect all the, examples, build the RL environment
Akshat [00:10:15]: Right
Swyx [00:10:15]: Sell it for like ten billion dollars to Meta.
Swyx [00:10:17]: And then also train a model and then sell that for sixty billion dollars to Elon. And this is
Akshat [00:10:23]: Yeah, of course
Swyx [00:10:23]: The funny machine. Like, it’s like, it’s about the hardware.
Akshat [00:10:28]: It’s hard to have that inherent conviction that the stuff will get that much better.
Swyx [00:10:33]: In retrospect, it’s so f*****g obvious.
Akshat [00:10:36]: Fair enough.
Swyx [00:10:37]: Like, what else were we doing back then? I don’t know. anyway. Yeah. So this. That was the start of your sandboxing journey, right? I feel like it didn’t blow up until, like, last year.
Akshat [00:10:49]: Yeah.
Swyx [00:10:50]: So there was like a couple years of quietness.
Akshat [00:10:52]: Exactly, yeah. We were
Vibhu [00:10:53]: I think very underrated product value. Like, my experience with Modal, Charles, before he had joined Modal, met this guy at a hackathon, and he really insisted we wanted to run some small model, not hosted anywhere, and he’s like, “ there’s this cool company, Modal. They’ll like spin up a GPU sandbox, we can throw it on there. They’ll take a Hugging Face link.” And like there’s so much value just right there, right? Like instant hosting, spin it up, spin it down. It’ll stay cold, but we run the demo a few days later, it’ll come back up and like all this stuff in retrospect, like it’s still what we needed like today.
Akshat [00:11:27]: Yeah, it’s still needed today. workload shapes have changed a lot as, we run stuff for people with really massive production scale and, there it’s it’s not about scaling from zero to one, but it’s how do we scale really elastically, from like thousand to fifteen hundred GPUs very quickly in a given region. It’s the same shape problem.
Elastic Inference, GPU Autoscaling, and Custom Models
Vibhu [00:11:50]: Okay. So you look at, say, Cursor Composer, right?
Akshat [00:11:53]: Yeah.
Vibhu [00:11:53]: They had a. “We’ll do RL on a model every couple hours.” you guys have a whole version of RL inference gym and whatnot.
Vibhu [00:12:01]: When you look at workloads like that, you’re doing train runs where you need to scale up, scale down every hour thousands of GPUs, right? That’s the example for we do need it, right?
Akshat [00:12:12]: Yeah. Well, so I’ll, I’ll take a step back and, maybe talk about like how people use Modal today. because our biggest use case is, elastic inference. And the thing we first found product market fit, with was inference for custom models. So we stayed away from the LLM space, and we were serving companies like Suno for audio, Runway for video, robotics, comp bio companies that train their own model elsewhere. But Modal is the best black box that for deployment, scaling to however many GPUs you need as your traffic pattern changes. And we saw all of them like have a very unpredict- predict- predictable, traffic pattern. it’s like diurnal. It’s Some days, like the company will do a launch and, they’ll need like, way more. And it’s not just one model that they deploy. They-- all these companies deploy, lots of different models in different regions, and so the autoscaling problem becomes even harder because then you have to scale within a certain region, and those cycles are offset. So different times you scale up in different regions.
Akshat [00:13:20]: So that’s like our sort
Vibhu [00:13:22]: And that
Akshat [00:13:22]: Yeah
Vibhu [00:13:22]: That in and of itself is a huge category. There’s a bunch of inference providers which, provide this fireworks, does this as a service together, whatnot, Base10. that’s carved into its own niche for language models, at least right now.
Akshat [00:13:36]: Yeah. the thing that we have specialized in is the autoscaling aspect.
Vibhu [00:13:41]: Yeah.
Akshat [00:13:41]: Because we found that it’s not universally true that everyone else can autoscale, and we’ve gone deeper into it on the tech side by, we’ve incorporated GPU snapshotting into the product so we can take the GPU state, like your torch.compile model, snapshot it, and the next cold start is way faster. And so going back to your question, it’s That’s why you need a lot of burstiness for inference. But then people also do a lot of demand training, like for RL stuff, your rollouts are bursty, as you said. People also do a lot of batch jobs. So we’ll see, a lot of companies, before they have a training run, they’ll need thousands of GPUs to run encoding or something like that. And I think those things are much more bursty than. I agree that agents are not that bursty. sandboxes are, except when you’re doing RL. RL is just
RL, Batch Jobs, and 100,000 Sandboxes
Vibhu [00:14:28]: Or commerce
Akshat [00:14:28]: Insanely bursty.
Vibhu [00:14:29]: Yeah.
Akshat [00:14:30]: Yeah. Like when you’re doing, rollouts, you sometimes need a hundred thousand sandboxes in your sandboxes.
Vibhu [00:14:37]: Yeah. I’m curious if you’ve seen early sparks of continual learning. There are some people, like our friends, ngram, recently announced this
Akshat [00:14:45]: Yeah
Vibhu [00:14:45]: They’re, they’re trying to do training. That also seems like a different workload, right? If you’re doing training twenty-four/seven per se, there’s a very weird dynamic of how you’re using GPUs between people and whatnot, but seems like something you guys would work for.
Akshat [00:15:00]: As you said, we’re, we’re fortunate to work with a number of, customers at the frontier and grab some of our customers. and they are taking the primitives we have, and trying to use them in very interesting ways, like continual learning. It’s possible as the stuff gets better, some of that will be part of, our offering as well if, more people need it. but we’re, we’re just waiting to see
Vibhu [00:15:23]: Yeah
Akshat [00:15:23]: How it shakes out.
Vibhu [00:15:24]: Is there a primitive that you added after sandboxing that was the next step in the story?
LLM Inference, DeFlash, and Speculative Decoding
Akshat [00:15:32]: I guess we’ve been going much deeper into LLM inference
Vibhu [00:15:35]: Yeah
Akshat [00:15:35]: Because we realized that some of the advantages we have with like autoscaling, again, especially in different regions and whatnot, are, not present elsewhere. and the place where we had a gap was we weren’t, working on the model layer itself. Like we were a black box. And, we realized that, we can get to frontier-level model performance, with, by having great people who work on this. And, we’ve been open sourcing a lot of our work, in terms of, Recently, we, shared our work on DeFlash, which is a block-based, speculator, and we’ve open sourced, all of it. So, you can - By using open source DeFlash, you can get the same performance as you would with one of the proprietary providers. And the next thing we’re thinking about here
Vibhu [00:16:23]: I thought this was
Akshat [00:16:24]: Yeah
Vibhu [00:16:24]: An interesting blog post as well, right? Like, I think in here you make a claim that. Not a claim, just that how effective speculative deco-decoding really just get to.
Akshat [00:16:33]: Yeah.
Vibhu [00:16:33]: Anything you wanna point out from this around, what people should know?
Akshat [00:16:39]: Yeah, absolutely. the high-level summary is, it would help to describe what speculative decoding is.
Vibhu [00:16:44]: Yes.
Akshat [00:16:44]: I will, yes.
Vibhu [00:16:45]: I think, like
Akshat [00:16:46]: Yeah
Vibhu [00:16:46]: So we’ve covered like Eagle and all this
Akshat [00:16:47]: Yeah
Vibhu [00:16:47]: Like Hydra and all those things, but it was like two years ago.
Akshat [00:16:51]: Yeah.
Vibhu [00:16:51]: I think it doesn’t hurt, right?
Akshat [00:16:52]: Yeah. Speculative decoding is you have a smaller model, called a draft model, predict tokens ahead of the bigger model, and then you have the bigger model, verify all of this, all the tokens are predicted. And the reason it’s faster is if you’re predicting, one token at once, you’re bound by memory bandwidth. But if you can batch the verification of, the draft model, then you’re much more efficient using compute, and it’s faster, and as long as your draft model is producing a lot of tokens that can get accepted, which is called the accept length, you can get a speed up that’s, multiple times of, the original model speed. and well, that’s what we highlight here. It’s Like people talk a lot about we made these kernels faster and whatnot, but improving kernel will only give you like few percentage points of improvement, and, increasing accept length, literally is a multiplicative decrease
Vibhu [00:17:47]: Like two to four X.
Akshat [00:17:48]: Yeah, exactly.
Vibhu [00:17:48]: Without much head-on performance.
Akshat [00:17:50]: Yeah. I think it may - you are running a second model, right? So it may be something more expensive in the compute,
Vibhu [00:17:57]: I meant quality performance
Akshat [00:17:58]: Probably not by much
Vibhu [00:17:58]: But yeah. I think
Akshat [00:17:59]: So there’s no drop in quality performance
Vibhu [00:18:01]: Yeah
Akshat [00:18:01]: Because you’re always. You’re never accepting a token that the big model
Vibhu [00:18:04]: It’s strictly better
Akshat [00:18:05]: Yeah
Vibhu [00:18:05]: Or it’s same.
Akshat [00:18:06]: Exactly.
Vibhu [00:18:07]: Right. Yeah.
Akshat [00:18:08]: And so we’ve been working a bunch on DeFlash, which is a block-based speculator. so it’s instead of predicting, one token at a time, it’s predicting a block. And we’ve been open sourcing our work with it. The next thing for us here is for helping people train speculators and custom models. it’s it’s something that traditionally is very forward-deployed engineering driven, support deployed, engineer driven, like you work with customers and help them do that. And our vision for. This is why we launched Auto Endpoints, is we want to make frontier-level performance available to everyone. And so, we mentioned this in the announcement, we teased it. The next thing we’re, we’re launching is, as you run an auto endpoint, we shadow traffic
Auto Endpoints and Frontier-Level Performance
Vibhu [00:18:54]: Do you want to explain what auto endpoints are?
Akshat [00:18:57]: Yeah.
Vibhu [00:18:57]: I lovely, yeah.
Akshat [00:18:58]: Yeah. So, this is, I guess, going back to your Modal is you touch the code, but, sometimes people don’t wanna touch the code, and they wanna get started with an endpoint that works and has all the great performance and, scalability that Modal has. So we’ve made that easier with, a way to create an endpoint from our UI, from the CLI, that has all of our optimizations that we talked about, like the DeFlash stuff already baked in, and there’s full transparency. So we give you the code, you can go run it yourself, and if you want, you can eject out into the full Modal experience, which we see as people get sophisticated, they do wanna tweak the models, they wanna, fine-tune stuff. You can still do all of that. It’s it’s not a black box. And yeah, the next thing, as we teased later in the post, is how do we give you value even beyond this in terms of having your draft models evolve as your data distribution evolves, again, without having to talk to a person and, yeah.
Vibhu [00:19:59]: I guess just to understand it directly, you have the GPUs, you have an endpoint that’s compatible, you serve open model. If someone was to do this themselves, what’s the delta that you guys provide? So you do a lot of open source great work on effective inference. how does it compare to, say, I take the same model, 5.2 FP8, take shelf inference engine, vLLM, SGLang, get compute of similar capacity, similar cost. What’s the delta that plugging into something this, like this offers outside of the benefit of, scaling?
Production Inference Beyond Raw GPUs
Akshat [00:20:34]: It’s interesting because we’ve taken the approach of open sourcing our contributions and upstreaming them. we work closely with the SGLang team. We want the improvements that our team, comes up with to be, there in open source for others to use, even outside of Modal. The benefit to us is we have a team that has significant expertise in terms of if you do have something that is not there, our team can help you get that performance, first. the other thing is with these endpoints, we are way more elastic, as you said, than, anyone else, and you have true scaling to zero. you have true, burstiness, and in practice, that matters a lot more to people than just finding, the GPU and, running Modal code on something.
Vibhu [00:21:20]: Yeah. And I will say it’s not that straightforward to just. like what I said is easier said than done, right?
Akshat [00:21:26]: Yeah.
Vibhu [00:21:27]: It’s I think still for the average person, still hard to just gut check using different. There’s, there’s quite a bit of combinations you can make there. the trade-offs aren’t really known at face value.
Akshat [00:21:40]: Yeah. it’s it’s not just that. I think it’s it’s that running production-grade inference is a hard infer problem.
Vibhu [00:21:49]: Yeah
Akshat [00:21:49]: Even if you subtract out the autoscaling
Vibhu [00:21:50]: Yeah
Akshat [00:21:51]: Is controlling things like tail latency and, making sure every, request is delivered at least once and whatnot.
The Model and Agent Lifecycle
Vibhu [00:22:00]: There’s a lot of innovation that you can do here. I think, it’s very interesting that you’re starting to encroach on, like as you become a full cloud, you’re starting to encroach on other people’s turf.
Vibhu [00:22:09]: What will you not do?
Akshat [00:22:13]: Well, we wanna follow our users and, make sure they get like a platform that has everything that works well together. so right now we’re focused on the model lifecycle and the agent, lifecycle. so both like going from data prep to training to inference, and then also if I want to deploy a background agent, let’s say, sandbox, do persistent storage, a whole bunch of other stuff.
Vibhu [00:22:38]: We talked to Cole, who did, OpenInspect. Yeah.
Akshat [00:22:42]: Yeah.
Vibhu [00:22:42]: And RealInspect also is on Modal.
Akshat [00:22:44]: Yeah. So Ramp Inspect was a great example of a background agent that was really successful because they, were able to use some of the primitives like snapshotting and fast scaling to just have something that feels really reactive and works well.
Ramp Inspect and Background Agents
Vibhu [00:23:02]: Yeah. That’s the new CTO of, Ramp right there.
Akshat [00:23:05]: Yeah, Rahul.
Vibhu [00:23:08]: It was really fun. yeah, okay, I think, all very bullish. Like, one of my reflections was also I did not originally. So when I met you guys
The Inference Inflection: CPU, GPU, and Co-Location
Vibhu [00:23:19]: You weren’t that much in the GPU game, and now you’re all about, inference. And one of the points that I hinged on for Jensen’s keynote at GTC this year was, what we’re calling like the inference inflection, right? That let’s say in AI workloads or machine learning workloads, it used to be like, let’s call it eight to one GPU to CPU, and now it’s more like one to one, which is like a interesting. Like, - because of how much agents are blocked or call out to this, to CPU heavy stuff the actual, like, limiting factor, like, swings back and forth from GPU to CPU a lot more than it used to be all GPU and then occasional CPU.
Akshat [00:24:01]: Yeah.
Vibhu [00:24:02]: GPU, CPU. And now it’s like just constantly, and you just have to locate everything.
Seventeen Clouds and the Supercloud Strategy
Akshat [00:24:08]: Yeah. And that’s one of the things that, again, we see as, something appealing about Modal, which is we’ve built this capacity pool that spans, 17 cloud providers, so we’re, we’re very good at Running on various kinds of cloud capacity across the world
Swyx [00:24:24]: You don’t have your own data centers?
Akshat [00:24:25]: We don’t have our own data centers. We just run across a lot of neo clouds
Swyx [00:24:29]: Yeah. Are
Akshat [00:24:30]: Metal providers.
Swyx [00:24:30]: Yeah. Question mark.
Swyx [00:24:31]: Yeah. You’re, you’re running the math, and you’re like, “What’s the cutover point where you’re like.”
Akshat [00:24:36]: Yeah, it’s a good question. part of it is we see our differentiator in the software layer, and, being capital light and focusing on the software helps us move really fast. so far it’s worked out well because there are so many other people building data centers that we’re able to work effectively with them, and again, focus on what makes us, special.
Swyx [00:24:55]: Yeah.
Swyx [00:24:56]: 17 gets you into, like, the local providers sometimes. Like
Akshat [00:25:00]: The,
Swyx [00:25:01]: Which was the most interesting one?
Akshat [00:25:02]: There are a lot more neo clouds than you expect, and they all have various degrees of, various levels of reliability. And, that’s why it’s something we’ve invested a lot of time in, is building our own reliability layer on top. so if the GPU falls off the bus or something happens, we user workloads are not affected, and that lets us use a lot more capacity than,
Swyx [00:25:30]: Yeah
Akshat [00:25:30]: You as a user would be able to.
Swyx [00:25:32]: It’s a useful thing to have because like now everyone knows, like, what layer you are and, like, you optimize for being the super cloud of all clouds.
Akshat [00:25:41]: Yeah. That’s, that’s, that’s the idea. and so I guess when you mentioned colocation, that’s, that’s another interesting thing where, one thing we’ve seen is people come to us when they want, very specifically located, CPUs or GPUs, like they want
Swyx [00:25:57]: Oh, they pin it in like
Akshat [00:25:58]: Yeah
Swyx [00:25:58]: EU?
Akshat [00:25:59]: Exactly. Or EU, US.
Swyx [00:26:01]: Right. Data resiliency
Akshat [00:26:02]: Australia
Swyx [00:26:02]: Locality thing or performance or what?
Akshat [00:26:04]: It’s either data locality or latency, yeah.
Swyx [00:26:07]: Yeah.
Akshat [00:26:07]: Like, you want your. They’re running sandboxes and model. They want them to be right next to a
Swyx [00:26:10]: Yeah, it’s easy then
Akshat [00:26:11]: Yeah
Swyx [00:26:12]: To. That is important in all those things. and so, like, you’ve accidentally, I don’t know if it’s accident, but, like, you’ve built the perfect primitive for agents to express themselves. And then, like, it’s almost very funny how every extra development just involves more file system, just involves more CPU.
Akshat [00:26:30]: Yeah.
Swyx [00:26:31]: Just like the things that you already have. I don’t know much about, if there’s any, like, networking usages that are interesting, but you’ve also done some good work on networking.
Networking, Sidecars, Private IPv6, and Sandboxes
Akshat [00:26:40]: Yeah, that’s exactly right. Like, we’re just taking compute storage and networking and building stuff on that layer, for, again, the stuff people need.
Swyx [00:26:49]: Yeah
Akshat [00:26:50]: We see a few interesting networking things coming up. one is people want networked sandboxes. so we have
Swyx [00:26:57]: For like a Docker cluster type thing.
Akshat [00:26:59]: Yeah.
Swyx [00:26:59]: Sorry, Docker Swarm. Oh, f**k. What is it called?
Akshat [00:27:02]: Compose.
Swyx [00:27:03]: Compose type thing.
Akshat [00:27:04]: Yeah. So if you want Docker Compose, our sandboxes now support, this thing called sidecars. So you can. A sandbox is a pod of containers, and you can run multiple containers in, a sandbox. also useful because, going back to networking, people want a lot of control over, outbound networking from a sandbox.
Swyx [00:27:23]: Yeah.
Akshat [00:27:23]: Like, they might wanna run a middle proxy for, like, maybe logging stuff for RL or, controlling how egress can happen to a domain, injecting credentials. and yeah. So we’ve, we’ve had to build a lot of that stuff ourselves.
Swyx [00:27:38]: Yeah.
Akshat [00:27:39]: But then also sometimes people want, sandboxes spanning multiple nodes to talk to each other, which is an emerging thing we’re seeing. We have support for that for a different reason, and yeah, we’ll see if that becomes stable.
Swyx [00:27:52]: Like, just an open socket. It’s a. This is directly like mTLS.
Akshat [00:27:56]: We do support that, which is you can, expose a tunnel inside a sandbox.
Swyx [00:28:01]: Yeah.
Akshat [00:28:01]: And then you can either expose it to public internet or it can be, you can add like a HTTP, auth layer above it. But we have this thing called I6PN, which we haven’t talked about, which is this, like, overlay network using IPv6 addresses. so if Modal containers, within the same workspace, when this is enabled, can address each other using this private IPv6 address, and no one else can.
Akshat [00:28:28]: So it’s like private networking, for containers. We built it because we needed it as a primitive for our distributed training product. so we have this other feature, which is you can add a decorator to a function, and you get a cluster of GPUs. and they have RDMA networking. so you can run a distributed training job, that’s truly serverless. and we did the overlay network for that. But then we’ve seen that people are using it for other reasons, and, I’m intrigued to yeah, what would people do with it.
Swyx [00:28:59]: Build primitives and let people figure it out, right?
Akshat [00:29:01]: Yeah, exactly.
Swyx [00:29:02]: You put out a pretty interesting
Akshat [00:29:03]: They’re like, they read the docs webpage. Let me use that
Swyx [00:29:06]: Yeah
Akshat [00:29:06]: Something they never intended to work. This is literally not even in our docs page. People somehow found it, and they’re using it.
RDMA, Memory Movement, and Distributed Training
Swyx [00:29:12]: Huh.
Swyx [00:29:14]: The way you portrayed it with, like, RDMA versus TCP, like, very well laid out, but just the transfer speed change at scale for RL, like yeah, you have it, you have it built in. I’m sure someone found it. It’s found it to be a lot more efficient before you made a thing out of it, right?
Akshat [00:29:32]: Yeah. And not to split hairs, I guess the overlay network is the TCP overlay network.
Akshat [00:29:39]: The reason we have that is you need that to do the key exchange for RDMA before you set up the RDMA network on top of that. but then people found the TCP part.
Swyx [00:29:48]: Can I tell you, this is like a big aha moment for me because
Akshat [00:29:51]: Yeah
Swyx [00:29:51]: So I review 2,200 submissions for the World’s Fair.
Akshat [00:29:56]: Yeah.
Swyx [00:29:57]: And then I got this from John Osterhout
Akshat [00:29:58]: Huh
Swyx [00:29:59]: Who I don’t know if. Do John Osterhout by name?
Akshat [00:30:01]: The name sounds familiar.
Swyx [00:30:02]: He published a. He’s a well-known professor, published a lot of interesting software design books, and this is the talk he chose to submit, is on RDMA at Inference. And I’m like, you wouldn’t think that this guy, who is like operating systems guy, would care about RDMA.
Akshat [00:30:20]: I, it makes sense to me because I,
Swyx [00:30:24]: This is the cloud, right? Yeah
Akshat [00:30:25]: Like, the way you move around your KV cache and how efficiently you can do it, how efficiently you move, your weights from your training GPUs to your inference GPUs in RL is there’s a lot of degrees of freedom, and it is a systems problem
Swyx [00:30:41]: Yeah
Akshat [00:30:41]: Moving memory around
Swyx [00:30:42]: Yeah
Akshat [00:30:43]: Scheduling.
Swyx [00:30:44]: This shows you how primitive my understanding of networking stuff is.
Swyx [00:30:46]: Is this like the domain of WireGuard as well?
Akshat [00:30:50]: Not quite.
Swyx [00:30:51]: It’s adjacent?
Swyx [00:30:53]: Explain everything.
Akshat [00:30:54]: Sure.
Swyx [00:30:56]: How do we move memory around GPUs?
Akshat [00:30:58]: Well, so sorry. Yeah, that is memory. Sorry, I was talking more, and maybe I was talking like five minutes back, about the private IPv6, addressing that you’ve set up.
Swyx [00:31:09]: Yeah.
Akshat [00:31:09]: Is it like it’s a VPN?
Swyx [00:31:10]: Yeah, it is like a VPN, and yeah, WireGuard is, yeah, you’re right. It is,
Akshat [00:31:16]: Right. Yeah, you already moved on to new topics
Swyx [00:31:17]: A similar
Akshat [00:31:18]: Okay
Swyx [00:31:19]: In the same space, WireGuard is, encrypted and this is,
Akshat [00:31:23]: And you don’t need encryption.
Swyx [00:31:23]: Yeah.
Akshat [00:31:24]: Yeah.
Swyx [00:31:24]: This is not encrypted. that’s the main difference. This is TCP and we have eBPF programs that will reject or allow the TCP connection based on whether you’re allowed to do it.
Akshat [00:31:35]: Used to involve a full sidecar, but now you have eBPF in the Linux kernel.
Swyx [00:31:39]: Yeah.
Akshat [00:31:40]: Yeah. I don’t know if this is a natural follow-on to the topic of like my skepticism on distributed training is that while, like, people spend a lot of money on, like, cables to hook up GPUs, and even that is not, like, fast enough, and that’s the bottleneck, is your networking fast enough?
Swyx [00:31:59]: Yeah. So I guess you’re talking about fully distributed training like, Dialog or something which is like cross data center
Akshat [00:32:06]: That would be, yes.
Swyx [00:32:07]: That’s the extreme.
Akshat [00:32:08]: Yeah.
Swyx [00:32:08]: You’re in the middle, and then other people would have like the Mellanox cables up in, like, their actual data center.
Akshat [00:32:14]: When you run multi-node training on Modal, RDMA, I think Mellanox, is, or InfiniBand is like a, is all seen as RDMA. but it’s a way to bypass the TCP networking stack and, transfer, stuff much faster, between one node, to the other. And we have I think like 3 terabit per second, internal networking
Swyx [00:32:40]: Okay
Akshat [00:32:40]: Which is the standard that’s needed.
Swyx [00:32:42]: Okay. So I misunderstood what
Akshat [00:32:43]: 50
Swyx [00:32:43]: What part of the stack you were
Akshat [00:32:44]: 50 gigs over
Swyx [00:32:45]: Yeah
Akshat [00:32:45]: If you went
Swyx [00:32:45]: Yeah
Akshat [00:32:46]: RDMA.
Swyx [00:32:46]: Okay.
Swyx [00:32:48]: Yeah. I, very impressive work.
Multi-Node Training, Post-Training, and Auto Research
Swyx [00:32:52]: So effectively you’re extending like the model philosophy to the training cluster, like, yeah.
Akshat [00:32:59]: Yeah. And we’re, we’re not going for like large scale training runs. the thing that we’ve built multi-node training for is, we see a lot of, smaller scale post-training. like, people are post-training like medium sized fund models, so they can, get higher quality on inference. this is a perfect fit, for something like that.
Swyx [00:33:21]: Yeah. That is my impression of how a lot of these labs explore branches in post-training and then eventually merge whatever they find in.
Akshat [00:33:31]: Yeah. The other use case we’ve seen for multi-node training is even if you have a big cluster, your researchers are still doing small runs
Swyx [00:33:38]: Yes
Akshat [00:33:39]: Having elasticity there
Swyx [00:33:40]: Right, sure
Akshat [00:33:40]: Matters a lot more.
Swyx [00:33:41]: Yeah. the, like, this is like the current limiting factor for auto research, which is like you need to give your model some GPUs in order for it to completely run.
Akshat [00:33:51]: We have a blog post on auto resource and model is,
Swyx [00:33:55]: Yeah
Akshat [00:33:56]: Yeah, like, turns out to be pretty good substrate for that.
Swyx [00:33:59]: So my impression is auto research means many things, like
Akshat [00:34:01]: Yeah
Swyx [00:34:01]: Anything that Andrej coins. Right now it’s still science fair, right? Like not like, I don’t know how many people are doing this.
Akshat [00:34:08]: We’re having a golf.
Swyx [00:34:08]: Yeah.
Akshat [00:34:09]: I thought the same thing.
Swyx [00:34:11]: Yeah, you would know.
Akshat [00:34:12]: We, like, our internal both training and inference teams use this the general shape of this quite a bit. like we have this one internal repo called auto inference, which essentially we’ve automated our own forward-deployed engineering efforts using, this harness, which is, the agent will just spin up a sweep of different things. It’ll even run like, NVIDIA inside profiler and it’ll like tweak configs and it’ll arrive the right thing. it’ll change your GPUs both from H200 to B200, and works really well.
Swyx [00:34:47]: Nice.
Akshat [00:34:47]: So yeah.
Swyx [00:34:48]: By the way, I enjoy that your forward-deployed engineering is so technical that you have to do these things.
Swyx [00:34:52]: It’s very different from forward-deployed engineering from other people.
Akshat [00:34:54]: Yeah. For our forward-deployed engineering team is, essentially they’re like applied inference researchers or applied training researchers.
Swyx [00:35:02]: Someone told me like they have to be able to build, but they also have to be able to sell. do they have to sell or are they like they’re good, they’re just like post-sale type of thing?
Akshat [00:35:09]: It does, being able to talk to a customer and engage effectively with them
Swyx [00:35:13]: Yeah
Akshat [00:35:13]: Matters a lot.
Swyx [00:35:14]: They want the same thing.
Akshat [00:35:15]: Yeah.
Swyx [00:35:15]: ?
Akshat [00:35:15]: But it’s it’s not really a sales, thing. We pair them with-- We have solution architects as well that are more on the sales side.
Swyx [00:35:23]: Okay. Let’s spend a bit more time on auto research. This is a big focus for for this year. Where does this go? like, have people explored enough? Like, there’s all these beautiful charts of like improve and then level off a bit and then you find the next thing. Is this one abstraction up from normal training? Is that how we think about it, or do you think about it differently? Like model level training versus high, like driven hyperparameter search.
Auto Inference and Modal Bench
Akshat [00:35:51]: Yeah, like,
Swyx [00:35:51]: Someone, some people call it like neural architecture search or whatever, right? Like.
Akshat [00:35:54]: Yeah, - So the stuff I’ve seen people do with it is nowhere on the architecture level. It’s pretty much tweaking parameters, but it’s it’s a hyperparameter sweep that’s guided by some model intuition, so it’s like much more efficient than, whatever other, sweep you would have.
Swyx [00:36:12]: Yeah, it’s just, it’s just a question of where you want to spend your compute?
Akshat [00:36:16]: Right.
Swyx [00:36:16]: ‘Cause yeah, you can just throw infinite amounts of money on this and somehow you’ll bang out Shakespeare?
Akshat [00:36:22]: Yeah, infinite monkey.
Swyx [00:36:24]: Yeah, so like the very good for model. and I think it’s also very important that agents can spin up other agents, can spin up their infrastructure. Like very good for you. how good is our LLMs at generating model code? Like the benefit of existing LLMs is that you are in the data.
Akshat [00:36:42]: Yeah. They’re, they’re surprisingly good. I think like pre Cloud 4 they were not, and then now they’re able to shot, stuff out of the box. But we’re playing around with releasing like a Modal Bench for like the harder
Swyx [00:36:55]: Yeah
Akshat [00:36:55]: Things, that the LLMs cannot do yet and maybe
Swyx [00:36:59]: What’s an example of that?
Akshat [00:37:01]: I think the things that- Sometimes agents struggle with, without right guidance and a skill is, how to, use the rest of our observability. Like how to. Something is failing, like how do you look at the logs and then update the right thing? It’s reasoning about that. But they’re able to shot, like
Swyx [00:37:23]: Yeah. You can just add a skill to it?
Compute Strategy and Capacity Planning
Akshat [00:37:26]: Yeah. So we have a Modal skill now that. Which is why we built this Modal Bench. It’s to find things like that, so we can address them in our tool.
Swyx [00:37:35]: Tune a skill. Yeah.
Akshat [00:37:36]: Yeah.
Swyx [00:37:36]: No. it’s it’s good. are you facing any shortages? like we talk a lot about GPU shortages, but also CPU, also memory.
Swyx [00:37:44]: Yeah.
Akshat [00:37:45]: We have had a lot of growth, which means that, there’s - we’ve had to be much better about
Swyx [00:37:53]: Planning
Akshat [00:37:54]: Proactive capacity planning.
Swyx [00:37:55]: Yeah.
Akshat [00:37:55]: So we have,
Swyx [00:37:57]: Which by the way, like it’s like a MBA’s like dream
Akshat [00:38:00]: Yes
Swyx [00:38:00]: Is like just planning this stuff. I think last time you and I talked about something maybe about this.
Akshat [00:38:03]: Yeah. we have a really competent team of people that we call, The role is called compute strategy. so yeah, if anyone listening here or wants to work on that
Swyx [00:38:13]: Compute strategy?
Akshat [00:38:13]: Yeah.
Swyx [00:38:14]: I think,
Akshat [00:38:14]: I feel like,
Swyx [00:38:15]: I think the normies call it FP&A or something.
Akshat [00:38:18]: Well, it’s more It’s it’s not FP&A. It’s it’s There’s a lot of interesting financial questions of like what is the blend between one year and three-year reservations? how do we forecast our own capacity? how do we. especially since our capacity is very fungible across different GPU types and different regions, like you have to model a lot of it. and you also have to have an opinion on how the supply chain is gonna evolve, and then you have to like, take bets,
Swyx [00:38:49]: Yeah
Akshat [00:38:49]: Based on that.
Swyx [00:38:50]: Tokenomics.
Akshat [00:38:50]: Yeah.
Swyx [00:38:51]: This is like probably a not a real point, but, I was trying to think about like what other industries. I was trying to think about like, we cannot be first to like these kinds of problems.
Akshat [00:38:59]: Yeah.
Swyx [00:39:00]: And what other industries have had this? And I was like, airlines with fuel and like they have to hedge their fuel and like, I think for a long time Southwest because they made like a hero fuel bet, they like were like super low cost because
Akshat [00:39:12]: Oh
Swyx [00:39:12]: Compared to everyone else.
Akshat [00:39:14]: Yeah. I hadn’t thought about that.
Vibhu [00:39:16]: We’re at a fun time too?
Akshat [00:39:18]: Yeah. It’s. A lot of the compute business in general, for us is also about being very good about capacity management. That is how you have great unit, economics. but also over time it’s how you can unlock more value for customers. Like, one of the things we’re building now is like a way for customers to get, If they don’t care about latency, like get much cheaper pricing and they’ll get results back in like next 24 hours or something, like a batch tier essentially.
Batch Tiers and Latency-Insensitive Workloads
Swyx [00:39:47]: Yeah.
Akshat [00:39:47]: And those are levers we have because we control the whole stack and scheduling and whatnot to give people a sufficient
Swyx [00:39:53]: Yeah. I feel like they’re not as popular. Like those, like the Frontier Labs have all those APIs. They’re not as popular as they should be.
Akshat [00:40:00]: The demand that we see for something like that is not for LLMs. although sometimes people wanna run evals and
Swyx [00:40:08]: Okay
Akshat [00:40:08]: Synthetic data prep and there it makes sense.
Swyx [00:40:10]: Okay.
Akshat [00:40:11]: But it’s from a lot of LLM companies, like people who are doing computational bio, like they have to run really big batch jobs and they don’t care about when they get it back.
Swyx [00:40:22]: Yeah. And like they have a reasonable. It’s it’s also like a cousin to the stopping problem of like, will this finish in time?
Akshat [00:40:30]: Yeah. You can bound it.
Swyx [00:40:33]: Yeah.
Akshat [00:40:33]: Like you can give people
Swyx [00:40:34]: Yeah
Akshat [00:40:34]: SLAs on it.
Swyx [00:40:35]: Yeah. I think what’s, what’s interesting is like the next phase of model.
Swyx [00:40:38]: Like what, do people expect from you, now that you’re established and you’re like well-known compute player among all these leading companies. You had an inference launch week, and we talked a little bit about the launches. like what else? Like what else should people know?
What Modal Builds Next
Akshat [00:40:55]: We are building primitives that make our users’ lives much easier. So, I think for example, with LLM inference, thousands more companies are gonna post-train their own models and, deploy open source models for inference. so we’re thinking a lot about what is the best product shape for that. And, that involves everything from our training gym to, then, endpoints that get frontier-level performance. again, but I haven’t talked to anyone. It looks somewhat different on other verticals. Like, we’re also seeing a lot of real-time, audio-video stuff in there, which is why like, we’re working on things like regional routing, with fallbacks. So you can get GPUs that are as close to users as possible. so you get like low latency for video streaming and whatnot. And then on the agent side, it’s,
Akshat [00:41:52]: We’re still working very closely with our customers because stuff is changing so fast in terms of what they need. And, I think beyond sandboxes and persistent file systems, there’s a lot of other things people will need from this agent stack as they build production agents. So yeah, we’re thinking about those other things that fit in there.
Swyx [00:42:13]: I want to ask what the other things are.
Akshat [00:42:15]: Yeah. I probably should share right now.
Swyx [00:42:17]: I think-- I think, okay, so, I do think a lot about the principal components of cloud, and you do talk about compute storage networking.
Akshat [00:42:25]: Yeah.
Swyx [00:42:25]: Because so far for me, it’s fine. so far for the. the first couple generations of cloud, it’s fine. What’s different, qualitatively different about agents that you need some new permission level? Like a lot of people, okay, and I’ll just kinda spew tokens at you until it like hopefully sparks something.
Akshat [00:42:43]: Yeah.
Swyx [00:42:44]: Like the new level now is whatever Claude Code does, which is dangerously scope permissions or like allow list by command or like whatever, right? And sometimes they’re like, “Well, okay, we have like this adaptive thinking mode where like, just trust me, bro. I will make the calls for you.” Is that it? like mediated permissions.
Hard Guardrails vs. LLM-Mediated Permissions
Vibhu [00:43:03]: Now you’re looping it with a goal and letting it roll.
Akshat [00:43:06]: Yeah, I’m, I’m skeptical of LLM media permission for stuff that is at the sandbox level because you do want hard boundaries.
Swyx [00:43:16]: Yeah.
Akshat [00:43:16]: Otherwise, someone can exfiltrate stuff.
Swyx [00:43:20]: But like
Akshat [00:43:20]: Yeah
Swyx [00:43:20]: Maybe that’s old school thinking. Maybe we’re the dinosaurs.
Swyx [00:43:23]: Maybe the AI OS or the LLM OS is really the kernel is a goddamn LLM.
Swyx [00:43:30]: Like it makes you feel uncomfortable.
Akshat [00:43:31]: Yeah, I’m, I’m told
Swyx [00:43:32]: But that’s what trusting the LLM is. Like imagine a spherical cow perfect LLM.
Akshat [00:43:36]: Right.
Swyx [00:43:37]: That it.
Akshat [00:43:39]: Maybe.
Swyx [00:43:41]: I wanna test the boundaries, right?
Akshat [00:43:42]: Yeah.
Swyx [00:43:42]: Like, and I don’t believe that, but I wanna see where I’m wrong ‘cause that’s, that’s the consensus.
Akshat [00:43:49]: Yeah. I think you always need hard guardrails when you want, And you can pair those with softer guardrails, right? And that’s gonna be a lot of mediated.
Managed Agents and Specialized Sandboxes
Swyx [00:44:00]: There. I’ll also get you a end with a couple of your commentary on like the ecosystem outside of Modal. Manage agents. Everyone has one. Gemini, OpenAI, Claude, very useful for you, but also like it is their way of starting to edge into your space.
Akshat [00:44:17]: Yeah.
Swyx [00:44:17]: What’s going on?
Akshat [00:44:19]: Yeah, we’re, very excited to partner with Anthropic and some of the other foundation labs, will not name who we’re also working with. the way we see it is the manage agent thing is a great place to start if you’re starting out building an agent and, But then when you get to, building something more production grade, like you’re a company that’s like Ramp that’s building their own, Ramp also runs their accounting agent on us, so their external-facing agent. You need a lot more control over, your compute primitive on things like, what sort - how do you persist different files that the agent has access to, and how do you snapshot and restore? How do you control the networking? maybe you want GPUs. When you get to that point, you kinda want, a specialized sandbox provider, that gives you those things, and that’s the role that we are trying to play.
Swyx [00:45:15]: Yeah
Akshat [00:45:16]: We don’t really have an opinion on the harness, whether it runs - it’s a cloud-managed agent, and you hook it up to Model Sandbox, or you run the harness in Model Sandbox. We’ll see where people converge with that.
Swyx [00:45:26]: Yeah. Do you any opinions on like the meta harnesses, or just another layer on top of these things?
Akshat [00:45:31]: You mean like the OpenPipe
Swyx [00:45:33]: OpenPipe is one. I think Vercel had one, which I can’t remember the name of right now. Fredshot had one. and then, to me, most recently was Data Databricks that had Omnigen. All these are meta harness. Like it’s kinda pseudo agent cloud type things.
Akshat [00:45:50]: I personally have not played around with them.
Swyx [00:45:53]: Yeah.
Akshat [00:45:53]: Build agents with them.
Swyx [00:45:54]: Everything’s bullish Modal, as long as it consumes more infra.
Akshat [00:45:57]: That’s why we’re focusing on the infra layer. It’s somewhere where our, relative competence is and, also it’s a hard problem to solve.
Swyx [00:46:06]: Yeah. I will say like just generally reflecting on that, I don’t know if - if there’s other topics on Modal, but like just generally reflecting as an infra person, not as intense as you, but in that field, this has like been the most exciting time in infra. Like it was boring for a while, and you couldn’t really get people excited about data infrastructure. Like Eric would get on Data Console, everyone just watched the video and like say, “Look at how many sandboxes I can spin up,” and no one gave a crap.
Why Infrastructure Became Exciting Again
Akshat [00:46:39]: Yeah.
Swyx [00:46:40]: And like now everyone gives a crap.
Akshat [00:46:42]: That’s true. It is a very exciting time, and I think a lot of that’s driven by just the amount of scale all of this stuff needs.
Swyx [00:46:50]: I think the, like a lot of your initiatives or a lot of your like product directions make sense in retrospect, which is like the best kind, but I wouldn’t necessarily have thought about it myself, which.
Akshat [00:47:00]: We need the predictions.
Swyx [00:47:02]: I think there’s a lot that you just don’t even see, right? Like you have the batch, you have the voice, you have the multimodal, but what else?
Akshat [00:47:10]: What else is coming up for us
Swyx [00:47:11]: Yeah. Where do you see things going?
Akshat [00:47:13]: Yeah. I, in general
Biotech, Robotics, and Non-LLM AI Workloads
Akshat [00:47:15]: It’s it’s clear that there’s there’s a huge shift happening. I think one thing that’s not as obvious to people because LLM inference gets talked about so much and is also we work a lot of companies that are, doing things like drug discovery and computational bio, like the Chai Discoveries of the world. Big things are probably gonna happen there. we work a lot of robotics companies that are putting robots in like active deployments and getting good results out of them.
Swyx [00:47:45]: Is there Air Gap Modal? Is there a version that is like prem air gapped whatever?
Akshat [00:47:50]: No. We,
Swyx [00:47:51]: You should cloud only.
Akshat [00:47:51]: Yeah.
Swyx [00:47:52]: Yeah. Okay. But yeah, so what you’re saying is like because you’re focused on primitives and they’re good primitives, you find use cases in all these kinds of things.
Akshat [00:48:01]: Yeah.
Swyx [00:48:01]: Probably diversifies you a little bit away from LMS all the time.
Akshat [00:48:05]: Yeah, absolutely. We’re, we’- our goal isn’t to only serve the LLM inference market.
Swyx [00:48:10]: There are a lot just on the website, the audio,
Akshat [00:48:12]: Yeah. We said both on
Swyx [00:48:14]: Computational bio images. Yeah, there’s a lot here. There’s QTA TTS, customizing. Oh, Chatterbox. there was customizing Whisper.
Akshat [00:48:24]: Okay. Yeah.
Swyx [00:48:25]: This screen reminds me of a fallen competitor, which Replicate.
Model APIs vs. Differentiated AI Products
Swyx [00:48:31]: What’s your postmortem on what happened?
Akshat [00:48:34]: This is one thing we’ve stayed away from is providing an API for models because I think providing model APIs is some of it ends up serving like a really hobbyist market, which is much less sticky.
Swyx [00:48:50]: Yeah.
Akshat [00:48:50]: And we’ve always wanted to build for companies that are building products and need more flexibility that’s not just an API.
Swyx [00:48:57]: Which you can build an API for a model and this is clearly what it is. But you - but what you’re saying, you can wrap it into a more fully functioning back end that you run.
Akshat [00:49:06]: Yeah. So all of our examples, it’s not that spin up this model, here’s an API token, use it. They’re all code.
Swyx [00:49:13]: Okay.
Akshat [00:49:13]: And so the point is that this is just an example.
Swyx [00:49:16]: Starter code.
Akshat [00:49:17]: Yeah. But you can tweak it however you want.
Swyx [00:49:20]: Yeah.
Akshat [00:49:21]: And if you’re like a company building a product, like, computational bio whatnot, yeah.
Swyx [00:49:26]: I guess I’m trying to tease out for listeners
Akshat [00:49:28]: Yeah
Swyx [00:49:28]: When does it stop becoming, oh, you’re just an API call and you’re just a wrapper on API to becoming what you call a product, right?
Swyx [00:49:36]: Like, what is that layer? Like what-- Like, more lines of code, but like beyond that, what is the substance that people add that qualifies it to be something more?
Akshat [00:49:46]: I think there’s a little bit of like a selection effect of like a lot of the companies who do wanna get deeper into that level are probably building something that’s more differentiated. And, I think, an example is like - with LLM inference, originally we, worked with companies that were building their own post-training frameworks or they were, - Ramp early in the day was training their own tokenizer and like swapping out the tokenizer in Llama and whatnot. I’m not saying that’s, that successful, in that case. But a better example is like, let’s say Suno. because Suno, does not use Modal for training.
Swyx [00:50:26]: Mikey on the pod. Yeah.
Akshat [00:50:27]: But they use Modal for all their inference and that’s because they have like a custom-- They have completely custom model architecture and that means that they have to be at the code level and tweak things that are not, just an API.
Swyx [00:50:41]: It’s interesting as well, like we had, Ethan, most recently on the xAI Groq team make a prediction that like the next tier in video gen is not a better video model, it’s a better model or agent that orchestrates video models.
Video Agents and Production Workflows
Akshat [00:50:56]: Oh, interesting.
Vibhu [00:50:56]: Language model backbone that can use tools
Akshat [00:50:58]: Right
Vibhu [00:50:59]: And write code.
Akshat [00:51:00]: Like, yes, I can make my second video or my second video from Groq, but I want my minute video.
Akshat [00:51:06]: And I’m not going there through normal video gen.
Swyx [00:51:10]: Yeah, that’s interesting. I - So we have GPU sandboxes and recently have seen a few companies doing agents that do video manipulation or,
Akshat [00:51:22]: Yeah. Give it FFmpeg and just do it.
Swyx [00:51:23]: Run FFmpeg. But like
Akshat [00:51:25]: That’s not enough.
Swyx [00:51:25]: Yeah.
Akshat [00:51:26]: You need to give it Adobe.
Swyx [00:51:27]: Yeah, I hadn’t put it together with like it would be a video production thing. in my mind these things were going more towards editing
Akshat [00:51:36]: Yeah.
Vibhu [00:51:36]: Well, shout out Mantis.
Akshat [00:51:37]: I think about this a lot.
Swyx [00:51:38]: .
Akshat [00:51:41]: Yeah. Sorry.
Vibhu [00:51:41]: Luma. Luma Agent is a version of this for video production, but it’s a off.
Swyx [00:51:46]: I was gonna get your quick takes, on some other stuff that happens
Gitpod/Ona, CI, and Runtime Sandboxes
Swyx [00:51:50]: In recent news and just-just see if you have anything interesting. Gitpod, very like-- somewhat like, different market. They’re in like the CI/CD market, but technically very impressive. I don’t know if you’ve like taken a real look at them.
Akshat [00:52:03]: Yeah. we’ve, - People on our team have talked to the Gitpod team and they’- they’re technically very strong.
Swyx [00:52:10]: Yeah.
Akshat [00:52:10]: I - We’re, we’re very bullish at Modal on the CI market as well because
Swyx [00:52:15]: Okay
Akshat [00:52:15]: There’s, there’s more agents, coding agents.
Swyx [00:52:18]: Yeah.
Akshat [00:52:19]: They’re gonna run a lot more CI and the primitives there can be much better.
Swyx [00:52:23]: I think there’s a lot of wasted CI.
Akshat [00:52:25]: Yeah.
Swyx [00:52:25]: So is it just like let’s filter? Like what is the highest order bid here in improving CI for agents?
Akshat [00:52:32]: Well, there’s a lot of wasted time in CI on like
Swyx [00:52:36]: Preparing
Akshat [00:52:36]: Preparing your artifacts and like, getting you to the preparing your dependencies and whatnot.
Swyx [00:52:44]: Oh.
Akshat [00:52:44]: And, like build systems help with that. But like if you have primitives that are like memory snapshot and restore, can you just run CI more efficiently?
Swyx [00:52:55]: Oh, okay. Okay. Okay. Interesting. Yeah. another form of like, demand compute.
Akshat [00:53:02]: Yeah, exactly.
Swyx [00:53:03]: Yeah.
Akshat [00:53:03]: It needs the same again, platform.
Swyx [00:53:06]: Yeah. So, for those who don’t know, Gitpod rebranded to Ona.
Swyx [00:53:09]: It was like there was this whole thing. I - I like semi-sounded the alarm at Cognition. I was like, “You should take these guys seriously because their infra is very good.”
Akshat [00:53:17]: Yeah.
Swyx [00:53:18]: And but, then they join OpenAI and, presumably we’ll, we’ll see Codex Cloud from the Ona team.
Swyx [00:53:26]: Like which I think would be very strong. - To me, like teams like that can set up the networking and like the secure boundaries for like, and your like agents to have their own cloud each, effectively is what you’re doing and I’m just trying to draw the analogy or the differences if you have studied them. Like what is the philosophical difference?
Akshat [00:53:47]: My sense is maybe they didn’t go after the right market at the right time because - I guess also got lucky with like agent use cases really taking off and, needing, like more of like a sandbox shaped thing than like, my understanding is, yeah, Gitpod
Swyx [00:54:06]: Really sandboxes work
Akshat [00:54:07]: Never mind
Swyx [00:54:07]: Like CI/
Akshat [00:54:08]: Yeah
Swyx [00:54:09]: Is sandboxes.
Akshat [00:54:09]: Yeah.
Swyx [00:54:10]: It’s just like build time sandboxes versus runtime sandboxes and it turned out runtime was better.
Akshat [00:54:15]: Right. And the difference there is runtime sandboxes have a different configuration surface of like how you configure images, how you like attach like storage
Swyx [00:54:25]: Yeah. It’s it’s fascinating. Other people, Astral also OpenAI.
Python, TypeScript, and the Future of SDKs
Swyx [00:54:30]: Also like Python tooling ecosystem people. Are you still bullish build- building on top of Python? Also recently Modular also got bought by Qualcomm. Just any of your takes there?
Akshat [00:54:43]: Yeah. we had Python as our first SDK language because that was the language that people did data and ML in. I now have Go and TypeScript SDKs as well. and our runtime is completely language- It is written in Rust, but it’s it’s not tied to Python by any means. We haven’t seen-- I think with like inference and training stuff, people are still very Python and the interesting thing with like the agent stuff is people use our TypeScript SDK a lot more because they’re not doing anything that needs ML.
Akshat [00:55:13]: I don’t think we’ll have to go beyond that super soon
Swyx [00:55:16]: Yeah
Akshat [00:55:16]: ‘cause Python and TypeScript is still Dominant.
Swyx [00:55:19]: The last two languages in the world.
Akshat [00:55:21]: Yeah.
Swyx [00:55:21]: That’s it.
Akshat [00:55:22]: Well, English and prompting is the fourth language.
Swyx [00:55:25]: English and prompting. I occasionally talk to people who try to build new languages. They’re like, - Even, what’s his face? Brett Taylor, who’s chairman of OpenAI was like, “We need a new language for LLMs.” So no one has come across one, and I keep looking. Python and TypeScript - You have a lot of data plus, but then also they are very imperfect as just as languages themselves. Then my close is, I think Modal used to be a big bet on developer experience.
Agent Experience as a Company-Building Wedge
Swyx [00:55:52]: And you’ve pivoted the team to agent experience. Is it like the way now, like, do - do, - can entire companies and unicorns, multi-unicorns be built on just having better agent experience? Do you need something else?
Akshat [00:56:05]: It’s a big part of our identity. it’s not just, like the very tactical, how does an agent use the CLI, but it’s also how easy is it to spin something up? Like, what is your iteration time when you wanna spin up a new service and, you wanna get something going in prod? in practice, that matters a lot, to people. And, I think it will continue to matter. Like, people are building stuff even faster, and if you give them ways to do it quickly not have overhead, then.
Swyx [00:56:37]: I think the debate for me has been, do you do anything differently that is, like, very fundamentally different for developer experience versus agent experience?
Swyx [00:56:44]: You seem to be on the side of they’re, they’re like this. They’re like cosine
Akshat [00:56:48]: Yeah. We also have a blog post on that.
Swyx [00:56:49]: Cosine similarity on, like, zero point nine or whatever.
Akshat [00:56:53]: Yeah. pretty much it’s the main shift for us has been, as I said, like, we built this, benchmark, Modal Bench, to see where agents are lacking
Swyx [00:57:02]: Yeah
Akshat [00:57:02]: Literally add surface areas to a product if they’re reaching for something, like maybe this should just be a CLI.
Swyx [00:57:09]: They halluc Oh, yeah. They hallucinate their own features.
Akshat [00:57:11]: Yeah. And sometimes it makes sense. Like if they’re reaching for this thing, it’s product feedback. Like, give it to them. And then, yeah, moving-- we used to only have, like, logs and metrics in our UI, just moving all those things to the CLI as well, so they’re accessible in that form.
Swyx [00:57:26]: Simple as that.
Closing: Modal Bench, AX, and Execution
Swyx [00:57:28]: Cool. Thank you so much. Yeah.
Akshat [00:57:29]: Yeah. Thank you.
Swyx [00:57:30]: This was great.
Akshat [00:57:30]: This was fun.
Swyx [00:57:30]: Yeah. It was a great update and, I can see why you guys have succeeded so much. it is really, focus, but also really good execution.
Akshat [00:57:39]: Thanks. we have a long way to go.
Swyx [00:57:41]: All right. Thank you.
Akshat [00:57:42]: Cool.

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

More Business podcasts

Trending Business podcasts

About Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

Podcast website

Business Entrepreneurship Science Technology

Listen to Latent Space: The AI Engineer Podcast, Young and Profiting with Hala Taha (Entrepreneurship, Sales, Marketing) and many other podcasts from around the world with the radio.net app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Open app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Latent Space: The AI Engineer Podcast

Scan code,
download the app,
start listening.