A support agent left running overnight made 847 API calls, consumed $340 worth of tokens, and accomplished exactly nothing. The dashboard showed a single user query: "What's the status of my refund?" The agent had spent eight hours calling every service in the stack, looping between order APIs, shipping endpoints, and inventory databases like a confused customer service rep with no notepad.
This is not a bug in the agent. It is architecture without memory. Every team building agents right now is one bad prompt away from the same bill.
One Request. Eight API Calls. Zero Memory.
Without persistent state, every tool call triggers another. The agent cannot remember it already checked the order.
Why Agents Loop
The agent loop is not a coding mistake. It's a memory architecture failure. Here's the chain:
- User asks a question. The agent decides it needs order data.
- Agent calls order API. The response includes a shipping reference.
- Agent calls shipping API. The response includes inventory data.
- Agent calls inventory API. The response includes a price.
- Price triggers pricing API. Which includes a customer segment.
- Customer segment triggers customer API. Which references the original order.
- Agent calls order API again. It has no memory of step 2.
Without persistent state, every tool result is a fresh input. The agent cannot distinguish between "I already checked this" and "This is new information I should act on." The loop is inevitable.
Why Prompting Doesn't Fix It
The standard response is "just prompt it better." Add instructions like "Remember what you've already done" or "Don't call the same API twice." This works in demos and fails in production for three reasons:
- Context windows forget too. Even with perfect instructions, the conversation history is limited. After 20-30 tool calls, earlier actions fall out of the context window entirely.
- Models hallucinate memories. Ask an LLM to "remember what it did" and it may confidently recall actions that never happened — or forget actions that did. Memory via prompting is fiction.
- Determinism matters. "Don't call the same API twice" is ambiguous. Same endpoint? Same parameters? Same result? Production requires structured event logs, not semantic approximations.
The Architecture That Actually Works
The fix is not more rules. It's structured memory.
Cortyxia stores every tool execution as a structured event with a unique hash of the inputs. Before the agent executes any new tool call, it queries its memory layer for functionally identical previous calls. If found, it returns the cached result. If not, it executes, stores, and continues.
This is not caching. Caching is time-based and brittle. This is semantic deduplication: the agent recognizes that "check order #12345" is the same operation regardless of when or why it was triggered. The loop never starts because the second iteration sees the first in memory and exits immediately.
In the documented test, adding memory architecture reduced the 847-call loop to 3 calls. Total cost: $0.04. The refund query was answered in 400ms instead of 8 hours.
Key Takeaways
- AI agent loops are not bugs — they're memory architecture failures.
- Without persistent state, agents cannot recognize they're repeating actions.
- Better prompting fails because context windows forget and models hallucinate memories.
- A single unresolved agent loop cost $340 overnight and 847 API calls.
- Structured memory with semantic deduplication eliminates loops at the source.
Agent Loops & Memory — Frequently Asked Questions
The Bottom Line
Building agents without memory is like hiring a consultant with no notepad. They'll ask the same questions, call the same APIs, and charge you for every repetition. The $340 overnight bill was not an edge case — it was the predictable result of architecture that discards state after every turn. Cortyxia gives your agents a memory. The loops stop. The bills drop. And your agent finally starts acting like it knows what it's doing.