Design

The wrong way to use a language model

English is expressive. It is also expensive, slow, and non-deterministic when used as a control flow mechanism. The temptation in AI-assisted tooling is to route everything through the model — ask it to check whether a file exists, ask it to run a command, ask it to verify its own output. This produces a system that is fragile, wasteful, and hard to debug.

Clodsite does the opposite.

Lane assignment

Every step in the workflow is labeled with its execution type:

Label What it runs Why
[SCRIPT] Deterministic bash Free, fast, reliable — same result every time
[LLM] Claude inference Where reasoning and generation actually earn their cost
[HYBRID] Script validates structure; LLM handles semantics Best of both

Scripts handle: checking wrangler is installed, validating JSON schema, copying files, running Eleventy, calling the Cloudflare API. These have known inputs and known outputs. There is no reason to pay for inference on them.

Claude handles: collecting answers through conversation, synthesizing a JSON spec from natural language, drafting page copy, generating Nunjucks templates, interpreting deploy errors. These require judgment, generation, or interpretation of ambiguous input.

Where each type runs in Clodsite

The natural evolution

v1 uses Model A: Claude orchestrates the workflow, invoking scripts via tool calls. The user interacts with Claude directly in the chat.

Model B — the next step — inverts the driver. A shell script calls claude -p at each LLM step, passing structured prompts and capturing structured output. Claude becomes a pure inference function. The workflow becomes a pipeline. The two models are compatible at the data layer; the spec and build plan are the handoff.