RAG Text Chunker

Split text into token-sized chunks for RAG / embeddings prep. Multiple strategies: recursive char, sentence-aware, semantic boundaries. Configurable overlap. All in-browser.

What is this for?

Retrieval-Augmented Generation (RAG) and embedding-based search both depend on splitting a corpus into chunks: small pieces that are individually embedded and stored in a vector database. The split happens before any of the AI machinery runs, but the quality of your retrieval quietly depends on it more than most people realise. Too-small chunks lose context; too-large chunks dilute relevance; chunks split mid-sentence retrieve poorly because the embedding lands in a weird semantic spot. This tool gives you a fast in-browser playground to experiment with chunk size, overlap, and strategy before you commit a pipeline to the choice.

The four strategies

Overlap — why and how much

Token estimation

Common gotchas