Mingqi Hou

The Six Pillars of Coding Agents (and How to Evaluate Any New Product)

Agent loop, tools, context, memory, multi-agent, and harness—one framework I use instead of chasing every new launch.

Every coding agent looks different in the UI, but most products wrestle with the same six problems. I use this map when comparing Cursor, Claude Code, Copilot agents, or internal tools—instead of feature checklists that age in a month.

One request, six systems

Say you ask an agent to refactor formatDate in src/utils.ts to use dayjs instead of moment. Between your enter key and “done,” roughly this happens:

  1. Plan the next step — read files, check dependencies, edit, test. That is the agent loop (think → act → observe). Without it, you only have a chatbot.
  2. Touch the repo — read, write, shell. That is the tool system (schemas, permissions, concurrency).
  3. Stay within context limits — remember what changed in nine files without stuffing 200k tokens of noise. That is context engineering.
  4. Remember team conventions across sessions — “we use pnpm.” That is memory (session vs long-term).
  5. Fork exploration — a sub-agent scans the repo and returns a short summary so the parent thread stays clean. That is multi-agent (usually for context isolation, not role-play theater).
  6. Stay safe and operable — confirm rm -rf, retry APIs, handle Ctrl+C, detect infinite loops. That is harness engineering.

Six pillars of agents

PillarOne lineAnalogy
Agent loopRepeat think → act → observeHeartbeat
Tool systemFiles, shell, APIsHands
Context engineeringWhat enters the window this turnBlood supply
MemoryFacts across sessionsLong-term recall
Multi-agentSplit work / isolate contextTeam lanes
HarnessPolicy, retries, hooks, lifecycleSkeleton

New launches are easier to read through these lenses: what did they change in the loop, context, or harness?

Depth most demos skip

Agent loop — Production loops add truncation recovery, layered retries, seven-ish exit reasons (user abort, max turns, hook veto, context overflow), and streaming tool execution. See who owns the loop.

Tool system — Accuracy often drops as tool count grows (deferred loading, sandbox scripts, “mask don’t remove” for cache stability).

Context engineering — Fifty tool calls × 2k tokens each fills a window fast. “Lost in the middle” means curating beats stuffing. Common tactics: offload to disk, compress/summarize, retrieve (RAG), isolate (sub-agents), cache (prompt/KV).

Memory — From a plain MEMORY.md file to SQLite + hybrid search—pick by audience size and debuggability, not hype.

Multi-agent — Sub-agents mainly compress exploration into a small parent message; worktrees add filesystem isolation.

Harness — The difference between a demo and something you trust on a client repo.

How this connects to my other writing

Products change weekly; these problems do not. That is what I optimize for when shipping agent features for clients.