Why we're bullish on loops
Agent loops are self-prompting systems with built-in evaluation.

The people who built OpenClaw and Claude Code both spent last week talking about the same idea: loops. Not prompting. Loops. Peter Steinberger and Boris Cherny, speaking separately and landing in the same place, described loops as follows: stop prompting agents to do individual tasks and start building systems that prompt themselves.
Agent loops are self-prompting systems where you define a goal, feed the agent context, and let evaluation happen autonomously rather than by hand. The case for them is operational: models are now capable enough to complete tasks that used to take days, and the teams shipping loops are outrunning teams that still work prompt by prompt. The same pattern applies to how we think about LinkedIn engagement.
What a loop actually is
Four components make a loop function.
Goal. The agent needs a scope. Without it, you get outputs that sprawl. The goal is a constraint, not just a direction.
Context. Context is the fuel. A loop starved of context halts or drifts. Good context includes tools, prior outputs, error traces, analytics, any signal the agent can act on. The better practice is to feed context throughout the run rather than dump it all at the start. The agent should be able to fetch new inputs as conditions change.
Evaluation. This is where loops differ most from standard prompting. In a loop, the agent checks itself: tests, metrics, an LLM-as-judge layer. Engineers stop doing verification and start defining what "good" means. Test-driven development still works. The loop enforces it automatically.
An agent. The simplest version is Claude Code with a while true flag, sometimes called Ralph after the open-source harness Geoffrey Huntley documented. More sophisticated versions use purpose-built context systems: an agent on a cron job that reads product data and emits work to subagents, or a loop that generates its own test suite.
Real examples, not hypotheticals
The teams talking about loops are not theorizing. PostHog documented a performance improvement after a loop surfaced a multi-year bug in their query engine (their write-up is public). Anthropic's case study describes Stripe completing a codebase-wide migration in a day that their team estimated at two months of manual work (source).
Other patterns that show up in practice:
- A PR babysitter whose goal is getting CI green. The context is the diff and test suite. The evaluation is the CI result.
- A bug fixer that reads the error trace as context and uses the test suite as its eval.
- A flaky test hunter that reads CI history and declares success on consecutive green runs.
- A performance researcher that treats a benchmark as its goal and the budget as a constraint.
What these share: the human defined the objective and the success condition. The agent did the iteration.
Why now
The "why now" is capability. METR's published benchmarks show Claude Opus 4.6 completing 50% of tasks requiring 12-hour runs, roughly six times the task-horizon of its predecessor (METR time-horizons report). The models are not just better at individual outputs; they are better at staying coherent across long runs.
That threshold matters because it changes the math. When agents could reliably handle 20 minutes of context, loops were fragile. When they can handle 12 hours, entire workflows become automatable.
The relevance for B2B founders building inbound
Most founders who want inbound from LinkedIn are still working prompt by prompt: no feedback signal, no evaluation layer, no accumulation. Each comment is a one-shot prompt.
The founders we audit who are actually seeing pipeline from LinkedIn share a recognizable structure. They have a defined goal (appear consistently in front of a specific audience), a context layer (which influencers their buyers follow, which posts are gaining traction, what has worked before), and an evaluation mechanism (are comments generating profile visits, and are those visits converting to conversations?).
Without evaluation, there is no loop. There is just effort.
We have written before about how finding the right posts to engage with is where most of the work actually pays off. You can read more in our piece on finding the right LinkedIn influencers to engage with and our breakdown of how B2B founders build pipeline through comments.
What the loop structure means for your engagement strategy
If you apply the four-component model to LinkedIn engagement, it looks like this:
Goal. Appear in the comment sections where your ideal buyers are already paying attention. A specific, bounded objective, not "post more" or "be active."
Context. Which creators your buyers follow. What formats and topics are pulling engagement in your niche right now. Your own history of what has generated inbound conversations versus what got ignored.
Evaluation. Profile visits from comments. Connection requests from people who match your ICP. Conversations that started in a comment thread. These are measurable. If you are not measuring them, you are running a loop without an eval layer, and you will not know whether to continue or correct.
Agent. In a LinkedIn context, the agent is you or a tool that surfaces the right opportunities. The constraint is time. Most founders we work with have 10 to 15 minutes a day for this. A loop structure means those 15 minutes go to verified high-signal opportunities, not scrolling.
A bad loop with a good agent still fails. A well-designed loop with a modest agent ships. Design the system first.
The thing that makes loops fragile
Context starvation. Both Steinberger and Cherny flagged this. Loops fail because the agent runs out of signal. It hits a point where it cannot verify its own output or find the next piece of work, so it drifts or halts.
In the LinkedIn context, context starvation looks like this: you engage consistently for two weeks, see no clear signal, and conclude that commenting does not work. But you were never measuring what mattered. You had no eval layer. The loop ran blind.
The fix is building the evaluation layer before you start. Define what a successful comment looks like in terms of outcomes (profile visits, conversation starts, follows from people in your ICP) before you make the first one. Then you have a signal to react to.
The 12-hour benchmark METR documented will look modest in 18 months. Build the evaluation layer now.
Frequently asked
An agent loop is a system where you define a goal, provide context, and set up an evaluation layer, then let the agent iterate until the goal is met. Regular prompting is one-shot: you write a prompt, you get an output, a human checks the output. In a loop, the agent checks itself and continues. The practical difference is that loops can handle long-running tasks without human intervention at every step.


