How LLMs work

An interactive walkthrough — click through each stage

Text in, numbers out

An LLM never sees letters. It chops your text into tokens — common words, word fragments, or punctuation — each mapped to a number. Try typing below to see tokenization in action.

Each token is a chunk the model recognizes. Common words like "the" get their own token. Rare words get split into pieces.

Tokens become meaning-vectors

Each token ID gets looked up in a giant table and converted into a vector — a list of thousands of numbers encoding meaning. Similar words end up with similar vectors. Click a token to see its (simplified) embedding.

Click a token above to see a simplified embedding vector.

Every token looks at every other token

This is the big idea: self-attention. Each token asks "which other tokens are relevant to me?" and computes a weight for every pair. Click any cell in the grid to explore.

The brighter the cell, the more token A pays attention to token B. Rows = "who is asking", columns = "who is being looked at".

Stack it deep: layer after layer

One round of attention isn't enough. The model stacks dozens of layers — each refining the representation. Early layers notice syntax; deep layers capture meaning, facts, and reasoning. Click a layer to see what it focuses on.

Predicting the next token

After all layers process the input, the model outputs a probability distribution over every possible next token. It samples from these probabilities to pick the next word — then feeds it back in and repeats.

The cat sat on the

Temperature 0.3

Temperature controls randomness. Low = predictable (picks the most likely token). High = creative (samples more broadly).