Regex LLM Output Extractor
Extract structured data (JSON, code blocks, key-value pairs) from messy LLM responses. Pick from common templates or write your own regex. Live preview.
Enter input above to see the result.
What is this for?
You asked the LLM for JSON. It gave you JSON wrapped in ```json fences with a "Sure, here you go!" preamble. Or it gave you "the answer is 42" when you wanted just 42. Or it gave you a numbered list when you asked for a comma-separated string. Welcome to the parsing problem nobody ships a real solution for. This tool keeps a small library of common extraction patterns plus a free-form regex editor, so you can iterate on the pattern with your actual response text right in the browser.
When to use it
- Designing an output parser for an agent pipeline. Throw a few real responses in, build a regex that handles all of them, paste it into your code.
- Debugging a broken parser. Your prod pipeline started failing because the model added "Here's the JSON:" before the fence. See exactly where your regex stops matching.
- Quick one-off extraction. You have ten LLM responses in a doc; need to extract the structured bit from each. Paste, match, copy, move on.
The templates
- JSON in ```json code fence — the most common case. Captures group 1 = the body. Use the
gflag if there might be multiple fences. - JSON in any code fence — same idea but the language tag is optional and we only capture if the body looks like
{ … }or[ … ]. - YAML in code fence — same as JSON-fence but matches
```yaml/```yml. - Any fenced code block — captures language tag + body. Use when you don't know what's inside.
- Plain JSON object — greedy match from first
{to last}. Brittle but works for "just JSON, no chrome" responses. - Numbered list items —
^\s*\d+[.)]\s*(.+)$withgm. Captures the text of each list item, no numbering. - key: value pairs — line-by-line. Captures key (group 1) and value (group 2). Stops at the first colon.
- Single classification label — useful for sentiment / safety classifiers that should reply with one word.
- Custom regex — clear the pattern and write your own.
The "parsed JSON" line
If the first capture group (or, failing that, the full match) parses as JSON, the tool prints the parsed result below the groups. That tells you not just "did the regex match" but "did it match the part you wanted, in a JSON-decodable way." If parsing fails, that line stays blank.
Common gotchas
- JavaScript regex, not PCRE. No
\K, no recursive patterns. Lookbehind requires modern browsers (post-2018 — fine for this tool but be aware if you copy the pattern elsewhere). - The model wraps its JSON in commentary. Don't try to find
{at start of string — find the fence or use a tolerant non-greedy capture. - Trailing commas in "JSON" output. Some models slip trailing commas in despite training. The regex match will work;
JSON.parsewill fail. Strip trailing commas before parsing. - Single-quoted "JSON". Same story — looks like JSON, isn't valid JSON, regex doesn't care,
JSON.parsedoes. - Nested fences. If the model puts a markdown code-fence example inside its response, you can get a false-positive match on the inner fence. Test with realistic data.
- Don't use regex for serious nested-bracket extraction. If the model returns an object containing an object containing an array of objects, write a proper JSON-aware extractor — or just ask for the fence form and parse the fence body.
- Privacy. The text and pattern stay in the page. No upload, no API call.