Reading path: I want to understand AI / LLMs
Status: 🟩 COMPLETE Last updated: 2026-06-19 Plain-English tagline: Zero to conversational fluency on large language models — what they are, how they work, the vocabulary, and the techniques (tool use, agents, RAG, MCP) that turn raw LLMs into useful systems.
What this is
A curated trail through the encyclopedia’s AI / LLM section. By the end you’ll be able to:
- Explain what an LLM actually is (and isn’t) to anyone
- Read AI-related news / docs / papers and follow what’s being claimed
- Use Claude (or any LLM) more effectively because you understand the mechanics
- Recognize when “AI” is being oversold and when it’s the right fit
- Understand the techniques that make LLMs do things rather than just chat
There are 20 stops. A reasonable pace: 1–3 entries per sitting. You can read this path entirely on a couch — no coding required.
How to use this path
Each stop has:
- 🎯 Why you’re here — the question this stop answers
- 📖 Read — the entries
- 🧠 Anchor — the single thing to remember
Stage 1 — The basics (3 stops)
1. What is an LLM, actually?
🎯 Why you’re here: Strip away the hype. Get a clean mental model of what these things really are.
📖 Read:
🧠 Anchor: An LLM is a statistical engine that predicts the next token (chunk of text) given the previous ones. Trained on enormous amounts of text. By doing this prediction very well, it ends up able to write, code, summarize, explain — but it’s not “thinking” in the human sense.
2. The unit LLMs actually see: tokens
🎯 Why you’re here: Models don’t see “words.” They see tokens. Every limit, every cost, every “context window” is measured in tokens. Understanding tokens unlocks intuitions about why prompts behave the way they do.
📖 Read:
🧠 Anchor: Roughly 1 token ≈ 4 English characters ≈ ¾ of a word. The context window is the model’s “working memory” — Claude Opus has ~200K tokens. Past that, things fall out of view.
3. Models, sampling, and randomness
🎯 Why you’re here: Why does the same prompt give different answers twice? What’s “temperature”? How is “the model” different from “the AI”?
📖 Read:
- Temperature & sampling 🟥
- Claude models 🟦 (current Claude lineup)
🧠 Anchor: An LLM doesn’t pick THE next token — it picks one from a probability distribution. “Temperature” controls how willing the model is to pick non-top choices. Low temperature → predictable. High temperature → creative / unpredictable. Each model in the Claude lineup (Opus / Sonnet / Haiku / Fable) has different size, speed, and cost trade-offs.
Stage 2 — How they actually work (3 stops)
This stage is optional if you’re impatient — but pays off massively in intuition.
4. The transformer — the architecture behind everything
🎯 Why you’re here: Every modern LLM is a transformer. Understanding the basic idea (attention) demystifies a lot.
📖 Read:
🧠 Anchor: The transformer’s key trick is attention — when computing each next token, the model can “look at” every prior position in the input, weighted by relevance. Before transformers, models processed text sequentially and struggled with long-range dependencies. After transformers, you could scale to billions of parameters and the architecture kept working.
5. Training vs inference — two different things
🎯 Why you’re here: “Training” and “running” are completely different operations. Mixing them up causes a lot of confusion.
📖 Read:
🧠 Anchor: Training is the (very expensive, weeks-to-months) process of teaching the model from a huge dataset. Inference is each individual time the model answers a question (cheap, milliseconds-to-seconds). For application work you almost always just use a trained model — fine-tuning is a niche choice.
6. Multimodal — beyond just text
🎯 Why you’re here: Modern Claude can see images. GPT can hear voice. Multimodal is the present, not the future.
📖 Read:
🧠 Anchor: “Multimodal” = the model can process more than just text. Same architecture (transformers), different tokenization for each modality. Claude can read images you paste; that’s not magic, it’s the model treating the image as tokens.
Stage 3 — Using LLMs effectively (3 stops)
7. Prompt engineering
🎯 Why you’re here: The single highest-leverage skill in working with LLMs. Same model + better prompt = wildly better output.
📖 Read:
🧠 Anchor: Be specific, give examples (few-shot), use clear structure (XML tags or markdown headings), tell the model who it is and what good looks like. Most “the AI is bad at X” complaints disappear with a better prompt.
8. The Claude API — calling LLMs from your code
🎯 Why you’re here: When you build with LLMs, you call them via API. Knowing the basic shape of an API call lets you read tutorials and understand what’s happening.
📖 Read:
🧠 Anchor: You POST a JSON body with your messages and config; you get back a response. Roughly: { model: "claude-opus-4-7", messages: [{role: "user", content: "..."}], max_tokens: 1024 } → { content: "...", usage: {...} }. Streaming responses chunk the reply as it generates.
9. Caching — making it cheaper and faster
🎯 Why you’re here: “Prompt caching” lets the same long context (instructions, retrieved docs) be reused across many requests without paying full token cost. Huge production lever.
📖 Read:
- Tokens & context windows (the caching section) 🟩
🧠 Anchor: Mark the static parts of your prompt as cacheable. Subsequent requests reuse those cached parts at ~10% the cost. Critical for production agents.
Stage 4 — Making LLMs do things (5 stops)
This is where chat assistants become agents.
10. Tool use — letting LLMs call functions
🎯 Why you’re here: A chat-only LLM can only talk. An LLM with tool use can read files, search the web, run code, send messages. This single feature is what turns LLMs from interesting to useful.
📖 Read:
- Tool use 🟥
🧠 Anchor: You give the model a list of tools (each with a name, description, and input schema). The model decides when to call one. The tool runs (in your code, not the model’s). You feed the result back to the model. Loop. That’s it.
11. Agents — LLMs that loop and act
🎯 Why you’re here: “Agent” is the most-hyped, least-defined word in AI. Strip away the marketing and you get a clear picture.
📖 Read:
- Agents 🟥
🧠 Anchor: An agent is an LLM with tool use, running in a loop, with the freedom to call tools repeatedly until the task is done. Claude Code is an agent. So is any “AI assistant” that can take actions on your behalf. The behavior emerges from the loop — there’s no special “agent” technology, just LLM + tools + loop.
12. MCP — Model Context Protocol
🎯 Why you’re here: MCP is Anthropic’s standard for connecting any LLM to any tool. Like USB-C for AI. Worth understanding because it’s becoming universal.
📖 Read:
- MCP 🟥
🧠 Anchor: MCP is a defined protocol — an MCP server exposes tools, an MCP client (Claude, ChatGPT desktop, Cursor, others) consumes them. Build one MCP server for your service; every major AI tool can use it. Decouples integration work from each agent.
13. RAG — Retrieval Augmented Generation
🎯 Why you’re here: The technique behind “AI that knows your documents.” Demystify it.
📖 Read:
- RAG 🟥
- Embeddings 🟥
🧠 Anchor: RAG = before answering, look stuff up. Three steps: (1) embed your documents into vectors and store them, (2) when a question comes in, embed it too and find the closest documents, (3) put those documents in the prompt as context. The LLM grounds its answer in retrieved content rather than its training data alone.
14. Building agents in practice
🎯 Why you’re here: See how all of the above (tool use + MCP + RAG + caching) come together in real systems.
📖 Read:
🧠 Anchor: Real agents = good system prompt + carefully chosen tools + memory/state + a loop. Claude Code is one of the cleanest examples in the wild. Studying how Claude Code is structured teaches you how to build your own (or work with one well).
Stage 5 — Critical thinking (3 stops)
15. What LLMs are bad at
🎯 Why you’re here: Calibrate expectations. Knowing failure modes is as valuable as knowing capabilities.
📖 Read:
- (Will be written as part of a “limits of LLMs” entry — currently scattered across other entries)
🧠 Anchor: LLMs hallucinate (make up plausible-sounding falsehoods). They are bad at exact arithmetic. They are bad at planning over long horizons without scaffolding. They are biased toward verbose, hedge-y answers. They don’t know what they don’t know. Tools and structured prompting mitigate these — but not entirely.
16. The safety landscape
🎯 Why you’re here: Constitutional AI, alignment, jailbreaks, harm reduction — these are the words you’ll see in any serious AI discussion.
📖 Read:
- (Currently distributed across other entries; planned standalone entry)
🧠 Anchor: Modern models are trained with extensive safety fine-tuning (RLHF / Constitutional AI / RLAIF). Jailbreaks are inputs designed to circumvent that training. The arms race is real and ongoing. For applications, you usually don’t need to think about this — you trust the model provider’s defaults.
17. Economic and societal context
🎯 Why you’re here: AI isn’t just technical — it’s economic, legal, social. The shape of the field matters for the products you build.
📖 Read:
- (Linked from various entries; planned standalone)
🧠 Anchor: The frontier models (Claude, GPT-4, Gemini) cost billions to train. A handful of companies have the resources to do this. Open-weight models (Llama, Mistral) are competitive at smaller scales. This shapes everything downstream — pricing, availability, regulation.
Stage 6 — Going deeper (2 stops)
18. The current frontier
🎯 Why you’re here: Where the research is moving in 2026 — reasoning models, agent advances, multimodal, world models.
📖 Read:
- (Living entry, planned)
🧠 Anchor: Reasoning models (Claude Opus with extended thinking, GPT-o-series) trade more compute at inference time for better answers on hard problems. Agents are becoming reliable enough for real workflows. Multimodal is mainstream. World models (predicting future state) are an active research area.
19. Comparisons and trade-offs
🎯 Why you’re here: Claude vs GPT vs Gemini vs Llama — when does each shine?
📖 Read:
- Claude models 🟦
- (Cross-model comparison entry planned)
🧠 Anchor: Treat the major models as comparable for most tasks. Claude is preferred for long context, careful reasoning, code; GPT for ecosystem and breadth; Gemini for multimodal and Google integration; Llama for self-hosting. Pricing and feature parity change monthly — don’t lock in opinions.
20. The application stack
🎯 Why you’re here: Tie it together — how an actual AI product is built.
📖 Read:
- All of 11. AI-assisted development
- Reading path: Master Claude Code 🟥 (next!)
🧠 Anchor: A real AI product = model API + good prompt + tool use + RAG (if needed) + memory + UI. Many are wrappers around Claude/GPT and that’s fine — the value is in the prompt, the tools, the data, and the UX. Not just “having an LLM.”
When you finish this path
You’ll be able to:
- ✅ Explain LLMs to a smart non-technical friend without resorting to “magic”
- ✅ Read AI news critically — separating hype from substance
- ✅ Write better prompts (one of the highest-leverage skills in 2026)
- ✅ Choose between models for a given task
- ✅ Understand what an “agent” is and isn’t
- ✅ Decide when RAG / tool use / fine-tuning is appropriate
- ✅ Talk to engineers building with AI without feeling lost
Where to go after this
- Reading path: Master Claude Code 🟩 — apply the LLM knowledge to the specific tool you use daily
- Reading path: I want to build my first webapp 🟩 — see how AI fits into a real build
- Section deep dive: 11. AI-assisted development 🟩 — the practical applied layer (fully complete)
- Section deep dive: 10. AI & LLMs 🟩 — the conceptual layer (fully complete)
- Try things — the fastest way to consolidate this knowledge is to push real prompts at the model and see what happens