An interactive walkthrough — click through each stage
Text in, numbers out
An LLM never sees letters. It chops your text into tokens — common words, word fragments, or punctuation — each mapped to a number. Try typing below to see tokenization in action.
Each token is a chunk the model recognizes. Common words like "the" get their own token. Rare words get split into pieces.
Tokens become meaning-vectors
Each token ID gets looked up in a giant table and converted into a vector — a list of thousands of numbers encoding meaning. Similar words end up with similar vectors. Click a token to see its (simplified) embedding.
Click a token above to see a simplified embedding vector.
Every token looks at every other token
This is the big idea: self-attention. Each token asks "which other tokens are relevant to me?" and computes a weight for every pair. Click any cell in the grid to explore.
The brighter the cell, the more token A pays attention to token B. Rows = "who is asking", columns = "who is being looked at".
Stack it deep: layer after layer
One round of attention isn't enough. The model stacks dozens of layers — each refining the representation. Early layers notice syntax; deep layers capture meaning, facts, and reasoning. Click a layer to see what it focuses on.
Predicting the next token
After all layers process the input, the model outputs a probability distribution over every possible next token. It samples from these probabilities to pick the next word — then feeds it back in and repeats.
Temperature controls randomness. Low = predictable (picks the most likely token). High = creative (samples more broadly).