Semantic Chunker — Split Long Text for LLMs Locally, Without Cutting Mid-Thought
Split a long document or codebase into LLM-sized chunks — without ever cutting mid-paragraph or mid-function.
Long text or code to split
How it works
Paste a long document or codebase and set a maximum chunk size. The chunker splits on paragraph breaks and never inside a fenced code block, falling back to sentence boundaries only when a block is oversized. Each chunk is numbered so you can paste them into an LLM in order, preserving context.
Why chunk text for LLMs?
A prompt longer than a model context window has to be split — but a naive character cut lands in the middle of a sentence, a JSON object or a function, and the model loses the thread. Semantic chunking breaks only at paragraph boundaries and keeps fenced code blocks intact, so each numbered chunk is self-contained. Paste them in order and the model follows the document as if it were never split.
FAQ
- Is my text uploaded?
- No. Chunking runs entirely in your browser in JavaScript — your document never leaves your device. The page only sends an anonymous usage counter (the tool name and the input size), never the content.
- How does it keep chunks coherent?
- It splits on double line breaks (paragraphs) and keeps fenced code blocks whole. Sentence and hard splits are used only as a last resort for blocks larger than the chunk size.
- Is there a size limit?
- Only your device memory. With no server you can chunk documents of many megabytes; large inputs are processed without freezing the page.
- What chunk size should I use?
- Set it below your model context window, leaving room for the reply — for example 8,000 to 12,000 characters per chunk for a typical chat model.