Skip to content
Home
Go back

Founder Note

(DAY 1146) Everyone's Just Waiting for Rate Limits to Refresh

Quick Context

In one line

The pace of modern engineering is dictated by how fast your AI quota refreshes.

Why this matters

The fundamental unit of engineering productivity is no longer the hour. It is the agent run. Teams that understand this are restructuring their workflow around compute availability, and that reshuffles everything from hiring to standups to capacity planning.

What changed my mind

I used to think AI coding tools would make engineers faster at the same job. Instead, they have changed the job itself. Developers now parallelize work across agents, manage queues, and literally wait for limits to refresh before firing the next task. That is a new kind of work.

I am watching this shift ripple through engineering teams and wondering how quickly it will reach design, marketing, sales, and operations. The behavior is unique, but the underlying logic is not—it will come to every function that starts relying on AI agents for real output.

Key line

"The new engineering constraint is not skill or time. It is quota. And when quota becomes the bottleneck, every behavior around work changes."

Founder Note Topic: Technology

If you spend time around engineering teams right now, you will notice a new phrase creeping into the vocabulary.

“I’m just waiting for my limits to refresh.”

That sentence did not exist two years ago. It is entirely a product of AI coding agents becoming the primary way real work gets done. Developers are no longer just writing code. They are instructing agents, queuing tasks, reviewing outputs, and pacing their day around when their quota resets.

This is a genuinely new behavior, and it is stranger than it first appears.

The old rhythm of engineering was continuous. You sat down, you worked on a problem, you typed until the problem was solved or the day ended. Productivity was a function of skill, focus, and time. The more skilled and focused you were, the more you could produce in a given number of hours.

That model is being replaced by something different. Now, a developer sits down, fires off a task to an agent, and waits. While the agent works, they plan the next task. When the agent returns, they review, iterate, and send another. The bottleneck is no longer the developer’s typing speed or even their thinking speed. It is how many agent runs they have available before the quota resets.

This has two immediate effects on behavior.

First, developers are learning to parallelize. They run multiple agents on multiple tasks simultaneously. They manage a queue of work. They think of their day not as a sequence of problems, but as a portfolio of agent runs. Some will come back with useful output. Some will fail. The skill is not just writing code—it is orchestrating a fleet of agents doing work on your behalf.

Second, and this is the part I find most interesting, developers are literally pausing mid-work to wait for rate limits to refresh. They hit their quota. The agent stops responding. And so they make tea, take a walk, read something, or write up a plan for the next batch. Then the quota resets and they are back in motion.

This looks, to an outside observer, like a lot of waiting. But what is actually happening is that the quota is acting as a forcing function for deeper thinking. When you know your next run has real cost, you think harder about what to send. You do not waste runs on poorly specified tasks. You write better prompts. You plan the architecture more carefully before committing.

In a strange way, rate limits have made engineers more thoughtful, not less. The constraint is reshaping the work.

What makes this fascinating to me is that this pattern is almost certainly coming to every other function inside a startup. Right now it is most visible in engineering because engineering was the first function to adopt AI agents seriously. But the same logic will apply anywhere agents start doing real work.

Sales teams running agents for outbound and research will pace their day around quota. Design teams iterating with image and video models will batch their work around model availability. Marketers generating and refining content will parallelize runs and wait for refreshes. Customer success teams using agents to triage and respond will queue work the same way.

The unit of productivity is shifting from the hour to the agent run. That is a subtle but enormous change in how work gets organized. Managers will eventually need to think about team capacity not in headcount or hours, but in aggregate agent throughput. Hiring decisions will be influenced not just by the person’s skill, but by how effectively they can orchestrate agents. Standups will include status on queues and quotas, not just tasks.

We are in the early innings of this shift, and most companies have not even noticed it happening. But if you watch any engineering team closely right now, you will see it. The waiting. The queueing. The pacing. The quiet reorganization of a day around compute availability.

It is a unique behavior. And like most behaviors that start in engineering, it will spread.


Read In Context

Keep following the thread this post belongs to

Read Next

Paths for readers like you

Technologists

A path through software, AI, interfaces, product behavior, and the practical side of technology.

Engineers, product people, and readers who care about how technology actually changes behavior.

Founders

A reading path for founders interested in hiring, company-building, incentives, growth, and the realities of building Edzy.

People building companies or thinking seriously about doing it.

Quick Answers

Questions this post answers

How have AI coding agents changed daily developer behavior?

Developers now run multiple tasks in parallel, managing queues of agent work. They pace their day around rate limit resets, batching thinking and review around when they know they can send the next round of instructions. It is a fundamentally new rhythm.

Why does this matter beyond engineering?

Because the same pattern will appear anywhere AI agents do real work. Sales development, content production, design iteration, customer research—every function that moves to agent-based execution will start organizing itself around compute availability and quota management.

Is waiting for rate limits actually productive?

In a weird way, yes. The limits force developers to think, plan, and review more carefully between runs. They cannot just brute-force iterate. The constraint shapes how people use the time, and often the quality of thinking goes up because you cannot afford to waste a run.

Related Posts

If this note clicked, keep going here