A practitioner playbook for engineering an agent's context window: token-aware design, just-in-time retrieval, compaction vs offloading, sub-agents, KV-cache discipline, and the four moves every working agent stack uses.
How I direct the agent loop instead of letting it drive. Six tools, a dozen human gates, and one rule I never break: nothing ships unattended. Requirements, brainstorm, grill, plan, test, implement, review, verify, design, compound.
More agents was supposed to mean more capability. Production says otherwise: every handoff between agents loses information, every parallel decision conflicts, and the system fails where context crosses. The number of agents is rarely the question worth asking.
The agent loop forgot, so we built memory systems. Memory streams, hierarchical paging, self-organizing notes, bi-temporal graphs. Some of it works. Most of it solves a problem context engineering has already fixed.
Every agent paradigm is one answer to the same question. What does the model emit at each turn? Get clear on the action and the harness almost designs itself.
What makes a great MCP server: workflow-shaped design, five to fifteen tools, descriptions treated as system prompt fragments, and responses budgeted in tokens.