Lesson 1 of 20% complete
Lesson 1
+50 XPWhat are Large Language Models?
What are Large Language Models?
Large Language Models (LLMs) are neural networks trained on vast corpora of text to predict the next token in a sequence. Despite this simple objective, emergent capabilities arise at scale.
Key Concepts
Autoregressive Generation: LLMs generate text token by token, where each token is sampled from a probability distribution conditioned on all previous tokens.
Tokenization: Text is split into subword tokens using algorithms like Byte Pair Encoding (BPE). "Hello world" might become ["Hello", " world"] or ["Hel", "lo", " world"] depending on the tokenizer.
Temperature Sampling: Controls randomness in generation. Higher temperature → more random, lower → more deterministic.
The Scale Hypothesis
Larger models trained on more data exhibit "emergent" capabilities:
- Few-shot learning (in-context learning without gradient updates)
- Chain-of-thought reasoning
- Code generation
- Instruction following
Code Sandbox
Python 3.11
Simulated Runtime
sandbox.py
python