Before shipping a new system prompt. Five-second sanity check. When iterating after eval regressions. The prompt that "feels fine" often has 3 absolutes contradicting each other. Reviewing a teammate's prompt. Surfaces things you can comment on without having to be a prompt-engineering expert. Auditing a long-lived prompt that's grown by accretion. Old prompts collect cruft; the linter highlights the kind that costs you most.

Specific role assignment — does it say what the model actually does, or just "be helpful"? Examples — at least one worked example beats any amount of prose. Two examples beat one. Output format — does it specify JSON / prose / table / markdown? Missing this is the #1 cause of fragile downstream parsers. Refusal behavior — what does the model do when the user goes out of scope? Hallucination guards — does it tell the model to verify, cite, or admit ignorance? Vague absolutes — too many "always" / "never" makes them all ignorable. Conflicting directives — "be concise" + "be thorough", or "always X" + "never X". Persona drift — multiple "You are…" sentences invite the model to switch personas mid-response. Token size — beyond ~2k tokens, middle-of-prompt instructions get lost. Smart quotes — copy-pasted from a Word doc, breaks downstream literal-string matching. Address — "You will" vs "The assistant should". Models prefer the former. Reasoning cue — for multi-step tasks, an explicit "think before answering" line. Meta-commentary leakage — phrases like "as an AI…" in the system prompt tend to leak into responses.

It's pattern-matching, not reading. It can't tell whether your examples are good or your role is meaningful. It just notices whether the surface patterns are present. False positives happen. A short, focused prompt might look "incomplete" against this rubric — sometimes incomplete is correct. It's not a substitute for evals. Passing every check doesn't mean your prompt is good; it means it's not obviously broken. English bias. The heuristics look for English keywords ("always", "you are", "respond in"). Non-English prompts will get noisy results. Privacy. Nothing leaves the page. All checks run in JavaScript in your browser.

System Prompt Linter

Analyze a system prompt for common issues: vague instructions, conflicting rules, missing examples, oversized context. Heuristic, opinionated, fast.

System prompt

Findings

What is this for?

Most system prompts in production are full of dead phrases. "Be helpful." "Always be accurate." "Never make things up." These are wishes, not instructions — the model can't act on them because they don't tell it what to do differently. This tool runs a short heuristic over your prompt and surfaces the patterns that almost always indicate the prompt is doing less work than the author thinks. It is opinionated and heuristic, not authoritative — but the gaps it flags are the same ones reviewers flag, and the same ones cause subtle production drift.

When to use it

Before shipping a new system prompt. Five-second sanity check.
When iterating after eval regressions. The prompt that "feels fine" often has 3 absolutes contradicting each other.
Reviewing a teammate's prompt. Surfaces things you can comment on without having to be a prompt-engineering expert.
Auditing a long-lived prompt that's grown by accretion. Old prompts collect cruft; the linter highlights the kind that costs you most.

What it checks

Specific role assignment — does it say what the model actually does, or just "be helpful"?
Examples — at least one worked example beats any amount of prose. Two examples beat one.
Output format — does it specify JSON / prose / table / markdown? Missing this is the #1 cause of fragile downstream parsers.
Refusal behavior — what does the model do when the user goes out of scope?
Hallucination guards — does it tell the model to verify, cite, or admit ignorance?
Vague absolutes — too many "always" / "never" makes them all ignorable.
Conflicting directives — "be concise" + "be thorough", or "always X" + "never X".
Persona drift — multiple "You are…" sentences invite the model to switch personas mid-response.
Token size — beyond ~2k tokens, middle-of-prompt instructions get lost.
Smart quotes — copy-pasted from a Word doc, breaks downstream literal-string matching.
Address — "You will" vs "The assistant should". Models prefer the former.
Reasoning cue — for multi-step tasks, an explicit "think before answering" line.
Meta-commentary leakage — phrases like "as an AI…" in the system prompt tend to leak into responses.

Limits

It's pattern-matching, not reading. It can't tell whether your examples are good or your role is meaningful. It just notices whether the surface patterns are present.
False positives happen. A short, focused prompt might look "incomplete" against this rubric — sometimes incomplete is correct.
It's not a substitute for evals. Passing every check doesn't mean your prompt is good; it means it's not obviously broken.
English bias. The heuristics look for English keywords ("always", "you are", "respond in"). Non-English prompts will get noisy results.
Privacy. Nothing leaves the page. All checks run in JavaScript in your browser.