The pace of modern engineering is dictated by how fast your AI quota refreshes.
If you spend time around engineering teams right now, you will notice a new phrase creeping into the vocabulary.
“I’m just waiting for my limits to refresh.”
That sentence did not exist two years ago. It is entirely a product of AI coding agents becoming the primary way real work gets done. Developers are no longer just writing code. They are instructing agents, queuing tasks, reviewing outputs, and pacing their day around when their quota resets.
This is a genuinely new behavior, and it is stranger than it first appears.
The old rhythm of engineering was continuous. You sat down, you worked on a problem, you typed until the problem was solved or the day ended. Productivity was a function of skill, focus, and time. The more skilled and focused you were, the more you could produce in a given number of hours.
That model is being replaced by something different. Now, a developer sits down, fires off a task to an agent, and waits. While the agent works, they plan the next task. When the agent returns, they review, iterate, and send another. The bottleneck is no longer the developer’s typing speed or even their thinking speed. It is how many agent runs they have available before the quota resets.
This has two immediate effects on behavior.
First, developers are learning to parallelize. They run multiple agents on multiple tasks simultaneously. They manage a queue of work. They think of their day not as a sequence of problems, but as a portfolio of agent runs. Some will come back with useful output. Some will fail. The skill is not just writing code—it is orchestrating a fleet of agents doing work on your behalf.
Second, and this is the part I find most interesting, developers are literally pausing mid-work to wait for rate limits to refresh. They hit their quota. The agent stops responding. And so they make tea, take a walk, read something, or write up a plan for the next batch. Then the quota resets and they are back in motion.
This looks, to an outside observer, like a lot of waiting. But what is actually happening is that the quota is acting as a forcing function for deeper thinking. When you know your next run has real cost, you think harder about what to send. You do not waste runs on poorly specified tasks. You write better prompts. You plan the architecture more carefully before committing.
In a strange way, rate limits have made engineers more thoughtful, not less. The constraint is reshaping the work.
What makes this fascinating to me is that this pattern is almost certainly coming to every other function inside a startup. Right now it is most visible in engineering because engineering was the first function to adopt AI agents seriously. But the same logic will apply anywhere agents start doing real work.
Sales teams running agents for outbound and research will pace their day around quota. Design teams iterating with image and video models will batch their work around model availability. Marketers generating and refining content will parallelize runs and wait for refreshes. Customer success teams using agents to triage and respond will queue work the same way.
The unit of productivity is shifting from the hour to the agent run. That is a subtle but enormous change in how work gets organized. Managers will eventually need to think about team capacity not in headcount or hours, but in aggregate agent throughput. Hiring decisions will be influenced not just by the person’s skill, but by how effectively they can orchestrate agents. Standups will include status on queues and quotas, not just tasks.
We are in the early innings of this shift, and most companies have not even noticed it happening. But if you watch any engineering team closely right now, you will see it. The waiting. The queueing. The pacing. The quiet reorganization of a day around compute availability.
It is a unique behavior. And like most behaviors that start in engineering, it will spread.