CurrentStack

Large Language Models Explained

What LLMs are, how they work, and why they behave the way they do — from tokens to transformers.

Published CurrentStack Explained

What is a Large Language Model?

A Large Language Model (LLM) is a type of AI system trained to predict and generate text. Given a sequence of words, it learns to predict what comes next — and by doing this billions of times across enormous amounts of text, it develops a rich internal representation of language, facts, and reasoning patterns.

The “large” in LLM refers to the number of parameters — internal numerical weights that determine the model’s behavior. Modern models range from a few billion to hundreds of billions of parameters.

Tokens, Not Words

LLMs don’t operate on words directly. They operate on tokens — chunks of text that may be whole words, parts of words, or punctuation. The word “unbelievable” might become two or three tokens. This tokenization step is how the model converts raw text into numbers it can process.

Token limits matter in practice: every model has a context window — the maximum number of tokens it can consider at once. Text outside the context window is simply not visible to the model.

The Transformer Architecture

Almost all modern LLMs are built on the transformer architecture, introduced in 2017. Transformers process all tokens in parallel rather than sequentially, which makes training on massive datasets tractable.

The key mechanism is attention: for every token, the model learns which other tokens in the sequence are relevant. This allows it to link a pronoun to its noun across many sentences, or understand that a word’s meaning shifts based on context.

Training: Pre-training and Fine-tuning

LLMs are developed in stages:

Pre-training is the expensive phase. The model processes trillions of tokens of text — books, web pages, code, articles — adjusting its weights to improve next-token prediction. This is where general language understanding is built.

Fine-tuning shapes behavior. A pre-trained model is further trained on curated examples to make it more helpful, follow instructions, or specialize in a domain. Techniques like RLHF (Reinforcement Learning from Human Feedback) are commonly used to align model outputs with human preferences.

Why LLMs Hallucinate

LLMs generate text by sampling from probability distributions — they produce what is statistically likely given the context, not what is factually verified. When asked about something outside their training data or a topic where patterns are ambiguous, they may produce confident-sounding but incorrect text.

This is called hallucination. It is a fundamental property of the architecture, not a bug to be patched with a simple fix. Mitigation strategies include retrieval-augmented generation (RAG), grounding outputs in cited sources, and structured output constraints.

Context and Memory

LLMs have no persistent memory across conversations by default. Each interaction starts fresh. Within a single context window, the model can reference anything earlier in the conversation — but once the window ends, that information is gone.

Systems that appear to “remember” across sessions do so by explicitly injecting prior context or summaries back into the prompt.

Temperature and Sampling

When generating text, the model doesn’t always pick the single most likely next token. Temperature is a parameter that controls randomness:

  • Low temperature (e.g. 0.1): outputs are more deterministic and predictable
  • High temperature (e.g. 1.0+): outputs are more varied and creative, but may be less coherent

Other sampling parameters like top-p and top-k further shape which tokens the model considers at each step.

Capabilities and Limits

LLMs are remarkably capable at tasks that benefit from pattern recognition across large amounts of text: summarization, translation, code generation, question answering, and reasoning over provided context.

They are less reliable for: precise arithmetic, real-time information (without retrieval tools), tasks requiring persistent state, and anything requiring guaranteed factual accuracy.

Understanding these boundaries is essential for building reliable systems with LLMs.