Building Mini Claude Code: A Coding Agent from Scratch

How I wired workspace sandboxing, tool schemas, the agent loop, system prompts, and background dev servers—before trusting black-box agent frameworks.

Published June 3, 2026

The fastest way I found to understand Cursor or Claude Code is not another diagram—it is implementing a tiny coding agent yourself. I did that as Mini Claude Code: first a ~200-line Python loop to learn the mechanics, then a TypeScript repo with streaming and harness details for day-to-day use.

This post is the Python walkthrough I wish I had when agents still felt like magic. If you only want the loop theory first, read Chatbot, Copilot, or Agent; come back here for tools, safety, and shell control.

What we are building

A CLI assistant that can:

Read (and later write/edit) files inside a sandbox directory
Call the LLM with function tools
Run an agent loop until the model stops requesting tools
Optionally run shell commands, including long-lived dev servers in the background

No framework—just the API, a tool registry, and discipline around paths and prompts.

Project skeleton

import os
import json
import subprocess
import threading
import time
import signal
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI

WORKDIR = Path.cwd() / "workspace"
WORKDIR.mkdir(exist_ok=True)

Everything filesystem-related stays under workspace/. That single rule prevents the model from wandering across your machine.

Tool schemas: the manual the model reads

The model does not know your capabilities until you describe them in the tools array:

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a text file for code review or context.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Path relative to workspace",
                    }
                },
                "required": ["path"],
            },
        },
    }
]

Each new capability (write_file, edit_file, exec) is the same pattern: name, description, JSON schema. The model picks tools; your code executes them.

Path guard and the first tool

def check_path(p: str) -> Path:
    resolved = (WORKDIR / p).resolve()
    if not resolved.is_relative_to(WORKDIR.resolve()):
        raise ValueError(f"Path escapes workspace: {p}")
    return resolved


class ReadFileTool:
    def execute(self, path: str) -> str:
        try:
            file_path = check_path(path)
            if not file_path.exists():
                return f"File not found: {path}"
            return file_path.read_text(encoding="utf-8")
        except Exception as e:
            return f"Read failed: {e}"


file_tools = {"read_file": ReadFileTool()}

Return errors as strings, not uncaught exceptions—the model needs observations to recover.

Wire up the API

# .env
DEEPSEEK_API_KEY=your_key

load_dotenv()
client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com",
)

Any OpenAI-compatible endpoint works; swap base URL and model name for your provider.

Agent loop: think → act → observe

def agent_loop(messages: list) -> str:
    max_iterations = 100

    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="deepseek-chat",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content or ""

        for tool_call in msg.tool_calls:
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)

            if name in file_tools:
                result = file_tools[name].execute(**args)
            else:
                result = f"Unknown tool: {name}"

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": name,
                "content": result,
            })

    return "Stopped: max iterations reached"

This is the entire product shape:

Send history (system + user + prior tool results).
If the assistant message has no tool_calls, we are done.
Otherwise execute each call, append role: tool messages, repeat.

The messages.append(msg) line matters: the assistant turn that requested tools must stay in history, or the API contract breaks.

Minimal REPL

def main():
    history = [{
        "role": "system",
        "content": "You are Mini Claude Code. You can read files under the workspace. Be concise.",
    }]

    while True:
        try:
            user_input = input("> ").strip()
        except (EOFError, KeyboardInterrupt):
            break

        if user_input.lower() in ("q", "exit", "quit"):
            break
        if not user_input:
            continue

        history.append({"role": "user", "content": user_input})

        try:
            answer = agent_loop(history)
            if answer:
                print(answer)
        except Exception as e:
            print(f"Error: {e}")
            history.pop()  # allow retry without a poisoned user turn


if __name__ == "__main__":
    main()

Smoke test: create workspace/test.txt, ask “read test.txt”, watch one read_file call and a grounded answer.

System prompt as an operations manual

Once multiple tools exist, a one-liner system prompt is not enough. I structure mine around:

Role — coding assistant, sandboxed paths
Workflow — read before edit; minimal diffs; ask when unsure
Tool hints — when to use edit_file vs write_file; paginate large reads
Safety — no paths outside WORKDIR
Output — short summary of files touched

Example excerpt:

SYSTEM_PROMPT = f"""You are Mini Claude Code, a coding assistant.

Rules:
- Read files before editing them.
- Prefer edit_file for small changes; write_file for new files or full rewrites.
- All paths must stay under {WORKDIR}/.
- After tool use, briefly state what you learned or changed.
"""

The prompt is product code. Changing it changes failure modes more than swapping model size.

Shell tools: foreground vs background

Coding agents need npm test (short) and pnpm dev (long). Blocking the loop for eight hours on a dev server is unacceptable.

I classify commands with a keyword list (dev, start, serve, vite, uvicorn, …). Matches go to background Popen; everything else uses subprocess.run with a timeout.

_DAEMON_KEYWORDS = [
    "dev", "start", "serve", "watch", "vite", "webpack",
    "nodemon", "uvicorn", "gunicorn", "flask run",
]

def _is_daemon_command(command: str) -> bool:
    cmd = command.lower().strip()
    return any(kw in cmd for kw in _DAEMON_KEYWORDS)

Background processes register in a global map with rolling logs (cap ~500 lines), a short startup wait, and magic management commands the model can call via the same exec tool:

bg_list — running jobs
bg_logs <pid> — tail logs
bg_kill <pid> — stop a server

On Unix, preexec_fn=os.setsid (or Windows process groups) lets you terminate the whole tree—not just the parent shell.

Update the tool description and system prompt whenever you add behavior like this. Otherwise the model treats a background PID as a failed synchronous command and loops forever.

What this taught me

Piece	Lesson
Sandbox	Agents are unsafe without filesystem boundaries
Tool errors as text	Observations drive self-correction
Loop cap	Always fuse max turns
Prompt	Constraints beat raw model IQ for coding
Background exec	Real agents must not block on `dev`

Production agents add streaming, parallel read-only tools, compression, permissions, and hooks—but none of that replaces this loop.

Next steps

Code: github.com/houmq/mini-claude-code (TypeScript harness I use for experiments)
Concept: Who owns the loop
Memory: Context budget, not chat logs

If you want this level of agent design on a client project—RAG, tools, or an internal coding bot—that is the kind of work I take on; see About for how to reach me.