Building Mini Claude Code: A Coding Agent from Scratch
How I wired workspace sandboxing, tool schemas, the agent loop, system prompts, and background dev servers—before trusting black-box agent frameworks.
The fastest way I found to understand Cursor or Claude Code is not another diagram—it is implementing a tiny coding agent yourself. I did that as Mini Claude Code: first a ~200-line Python loop to learn the mechanics, then a TypeScript repo with streaming and harness details for day-to-day use.
This post is the Python walkthrough I wish I had when agents still felt like magic. If you only want the loop theory first, read Chatbot, Copilot, or Agent; come back here for tools, safety, and shell control.
What we are building
A CLI assistant that can:
- Read (and later write/edit) files inside a sandbox directory
- Call the LLM with function tools
- Run an agent loop until the model stops requesting tools
- Optionally run shell commands, including long-lived dev servers in the background
No framework—just the API, a tool registry, and discipline around paths and prompts.
Project skeleton
import os
import json
import subprocess
import threading
import time
import signal
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI
WORKDIR = Path.cwd() / "workspace"
WORKDIR.mkdir(exist_ok=True)
Everything filesystem-related stays under workspace/. That single rule prevents the model from wandering across your machine.
Tool schemas: the manual the model reads
The model does not know your capabilities until you describe them in the tools array:
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a text file for code review or context.",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path relative to workspace",
}
},
"required": ["path"],
},
},
}
]
Each new capability (write_file, edit_file, exec) is the same pattern: name, description, JSON schema. The model picks tools; your code executes them.
Path guard and the first tool
def check_path(p: str) -> Path:
resolved = (WORKDIR / p).resolve()
if not resolved.is_relative_to(WORKDIR.resolve()):
raise ValueError(f"Path escapes workspace: {p}")
return resolved
class ReadFileTool:
def execute(self, path: str) -> str:
try:
file_path = check_path(path)
if not file_path.exists():
return f"File not found: {path}"
return file_path.read_text(encoding="utf-8")
except Exception as e:
return f"Read failed: {e}"
file_tools = {"read_file": ReadFileTool()}
Return errors as strings, not uncaught exceptions—the model needs observations to recover.
Wire up the API
# .env
DEEPSEEK_API_KEY=your_key
load_dotenv()
client = OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com",
)
Any OpenAI-compatible endpoint works; swap base URL and model name for your provider.
Agent loop: think → act → observe
def agent_loop(messages: list) -> str:
max_iterations = 100
for _ in range(max_iterations):
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools,
tool_choice="auto",
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content or ""
for tool_call in msg.tool_calls:
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if name in file_tools:
result = file_tools[name].execute(**args)
else:
result = f"Unknown tool: {name}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": name,
"content": result,
})
return "Stopped: max iterations reached"
This is the entire product shape:
- Send history (system + user + prior tool results).
- If the assistant message has no
tool_calls, we are done. - Otherwise execute each call, append
role: toolmessages, repeat.
The messages.append(msg) line matters: the assistant turn that requested tools must stay in history, or the API contract breaks.
Minimal REPL
def main():
history = [{
"role": "system",
"content": "You are Mini Claude Code. You can read files under the workspace. Be concise.",
}]
while True:
try:
user_input = input("> ").strip()
except (EOFError, KeyboardInterrupt):
break
if user_input.lower() in ("q", "exit", "quit"):
break
if not user_input:
continue
history.append({"role": "user", "content": user_input})
try:
answer = agent_loop(history)
if answer:
print(answer)
except Exception as e:
print(f"Error: {e}")
history.pop() # allow retry without a poisoned user turn
if __name__ == "__main__":
main()
Smoke test: create workspace/test.txt, ask “read test.txt”, watch one read_file call and a grounded answer.
System prompt as an operations manual
Once multiple tools exist, a one-liner system prompt is not enough. I structure mine around:
- Role — coding assistant, sandboxed paths
- Workflow — read before edit; minimal diffs; ask when unsure
- Tool hints — when to use
edit_filevswrite_file; paginate large reads - Safety — no paths outside
WORKDIR - Output — short summary of files touched
Example excerpt:
SYSTEM_PROMPT = f"""You are Mini Claude Code, a coding assistant.
Rules:
- Read files before editing them.
- Prefer edit_file for small changes; write_file for new files or full rewrites.
- All paths must stay under {WORKDIR}/.
- After tool use, briefly state what you learned or changed.
"""
The prompt is product code. Changing it changes failure modes more than swapping model size.
Shell tools: foreground vs background
Coding agents need npm test (short) and pnpm dev (long). Blocking the loop for eight hours on a dev server is unacceptable.
I classify commands with a keyword list (dev, start, serve, vite, uvicorn, …). Matches go to background Popen; everything else uses subprocess.run with a timeout.
_DAEMON_KEYWORDS = [
"dev", "start", "serve", "watch", "vite", "webpack",
"nodemon", "uvicorn", "gunicorn", "flask run",
]
def _is_daemon_command(command: str) -> bool:
cmd = command.lower().strip()
return any(kw in cmd for kw in _DAEMON_KEYWORDS)
Background processes register in a global map with rolling logs (cap ~500 lines), a short startup wait, and magic management commands the model can call via the same exec tool:
bg_list— running jobsbg_logs <pid>— tail logsbg_kill <pid>— stop a server
On Unix, preexec_fn=os.setsid (or Windows process groups) lets you terminate the whole tree—not just the parent shell.
Update the tool description and system prompt whenever you add behavior like this. Otherwise the model treats a background PID as a failed synchronous command and loops forever.
What this taught me
| Piece | Lesson |
|---|---|
| Sandbox | Agents are unsafe without filesystem boundaries |
| Tool errors as text | Observations drive self-correction |
| Loop cap | Always fuse max turns |
| Prompt | Constraints beat raw model IQ for coding |
| Background exec | Real agents must not block on dev |
Production agents add streaming, parallel read-only tools, compression, permissions, and hooks—but none of that replaces this loop.
Next steps
- Code: github.com/houmq/mini-claude-code (TypeScript harness I use for experiments)
- Concept: Who owns the loop
- Memory: Context budget, not chat logs
If you want this level of agent design on a client project—RAG, tools, or an internal coding bot—that is the kind of work I take on; see About for how to reach me.