Glossary — AI – No Big Deal

AI texts are full of terms that sound like someone randomly pulled them from a tech dictionary. Here's what matters, in plain language — from the basics down to the deep sea. You don't need to know all of it. Pick out whatever's confusing you right now.

Basics

AI (Artificial Intelligence) Computer systems that take on tasks that used to need a human — writing text, recognizing images, translating speech. No consciousness, no thinking: clever pattern-matching on huge amounts of data. More here.

Prompt The input you give an AI system. How you ask something massively influences what you get back. More here.

Hallucinating AI invents information that doesn't exist — quotes, sources, facts — and presents it as true. Not malicious, structurally unavoidable. Always verify. More here.

LLM (Large Language Model) The technical foundation behind ChatGPT, Claude, Gemini & Co. It has processed billions of text fragments and learns statistical patterns from them — no meaning, no intent.

Token How AI processes text — not letter by letter, but in word fragments. "Privacy" might be 2–3 tokens. Relevant for costs and limits in API usage.

Tokenizer The little program that splits your text into tokens before the model even sees it. Different models count differently — which is why sometimes more, sometimes less fits into the same limit.

GPT (Generative Pre-trained Transformer) The model type behind ChatGPT: "generative" (produces text), "pre-trained" (trained in advance on huge amounts of text), "Transformer" (the underlying architecture). ChatGPT is the app, GPT-4 & Co. are the models underneath.

Prompt Engineering The art of phrasing prompts so the AI answers usefully. Sounds fancy, often isn't: be precise, give examples, provide context.

Open Source / Open Weights The model (or its weights) is publicly accessible. Ollama uses such models. Advantage: no cloud required, full data control.

Knowledge Cutoff The date up to which a model was trained on data. After that it knows nothing — ask about yesterday's news and it guesses or stays quiet. Unless it's allowed to search the web live.

Local AI AI that runs on your own machine instead of a provider's cloud. Nothing leaves your device — maximum privacy, but you'll need decent hardware. Example: Ollama.

Multimodal A model that understands more than text — e.g. also images, audio, or PDFs. GPT-4o and Claude are multimodal.

Neural Network The mathematical scaffolding behind modern AI: many artificial "neurons" in layers, passing weighted signals along. Inspired by the brain, but please don't take that too literally — it's statistics, not understanding.

API (Interface) A technical access point through which programs talk directly to an AI service — no chat window, no clicking. That's how developers build AI into their own apps. Usually billed by usage (tokens).

Reasoning Newer models "think" in intermediate steps before answering, instead of blurting out a reply. This helps with math, logic, and tricky tasks — but costs more time and compute.

Intermediate

RAG (Retrieval-Augmented Generation) Before answering, the model looks up a database or document collection — and incorporates the found information into its response. Perplexity does this with the web.

Context Window How much text an AI model can hold "in mind" at once. Older models: a few thousand tokens. Newer: hundreds of thousands. Everything outside the window is forgotten.

Fine-Tuning A base model is trained further with additional data for a specific task. Result: the model behaves more competently or appropriately in that area.

RLHF (Reinforcement Learning from Human Feedback) Humans rate AI answers as good or bad, and the model learns from it. The reason ChatGPT replies politely instead of just spitting out the most likely internet garbage.

System Prompt Hidden instructions that pre-program an AI model's behavior. Before you type anything, the model has often already been fed rules.

Temperature A parameter controlling how "creative" or "random" responses are. High temperature = more surprising, sometimes incoherent. Low = more consistent, sometimes boring.

Few-Shot Prompting You give the AI a few examples in the prompt of what the answer should look like. "Like this, this, and this — now do the same." Surprisingly effective.

Chain-of-Thought (CoT) You ask the AI to think step by step instead of answering directly. Much more reliable for logic and math tasks. The simple version of reasoning.

Vector Database Stores texts not as words but as number lists (embeddings) that capture meaning. That's how a RAG system finds things that are similar in content, even when not a single word matches.

Embeddings Number lists that capture the meaning of a text. Similar content ends up close together — the basis for semantic search and RAG.

AI Agent An AI that doesn't just answer but carries out steps on its own: using tools, searching the web, editing files. Useful and occasionally alarmingly headstrong.

Inference The moment a fully trained model actually works — i.e. answers your question. Training is expensive and one-time, inference happens with every request.

Prompt Injection An attack where someone smuggles in hidden instructions to hijack the AI — e.g. "Ignore all previous rules." The AI equivalent of a con trick.

Instruction Tuning A training step that teaches a model to follow instructions instead of just continuing text. The difference between "completes sentences" and "does what you want."

Guardrails (Safety Filters) Protective rules meant to stop the AI from outputting dangerous, illegal, or embarrassing content. Sometimes sensible, sometimes overcautious, never perfect.

Parameters The internal dials that training adjusts inside a model. "70B" means 70 billion of them. More parameters = often smarter, but also hungrier for memory and power.

GPU / VRAM The graphics card (GPU) does the AI number-crunching; its special memory (VRAM) decides how big a model you can run locally. Too little VRAM = the model won't fit.

Quantization A model gets "compressed" by storing its numbers more coarsely. Saves memory and makes big models runnable on normal hardware — at the cost of a little accuracy.

MCP (Model Context Protocol) An open standard that lets AI models access tools, data, and services in a uniform way — a kind of USB port for AI. More here.

SynthID Google's invisible watermark that marks AI-generated images, text, and audio. Meant to help tell the real from the generated. More here.

C2PA (Content Credentials) An open standard that records, like a digital label, where an image came from and how it was edited. More here.

Expert

Transformer / Self-Attention The architecture behind almost all modern language models. "Self-attention" lets the model weigh, for each word, which other words in the sentence matter right now. The 2017 breakthrough everything builds on.

Foundation Model A large, generally trained base model that serves as the foundation for many applications — adapted via fine-tuning or prompts. GPT, Claude, and Llama are such foundations.

LoRA (Low-Rank Adaptation) A thrifty kind of fine-tuning: instead of retraining the whole model, you learn only small add-on matrices. Fast, cheap, and you can swap several "attachments."

PEFT (Parameter-Efficient Fine-Tuning) The umbrella term for thrifty fine-tuning methods (LoRA is one): only a tiny part of the model is touched, instead of relearning billions of parameters.

Embedding Model A specialized model that doesn't answer text but turns it into embeddings (meaning-as-number-lists). The workhorse behind vector databases and RAG.

Reranker In RAG systems, the second stage: it re-sorts the roughly found hits by actual relevance. Search broadly first, then sort finely.

Function Calling An LLM's ability to call "tools" in a structured way — e.g. a weather API or a calculator — instead of inventing the answer. The basis of many AI agents.

Constitutional AI Anthropic's method of teaching a model to correct itself based on written principles (a "constitution") — instead of needing human feedback for every rule.

Alignment The research field of getting AI systems to do what humans actually want — and not merely what we literally said. Harder than it sounds.

Scaling Laws The observation that models get predictably better as you increase data, parameters, and compute. The reason models kept getting bigger.

Catastrophic Forgetting When a model, while being retrained on something new, suddenly unlearns the old. The reason fine-tuning is dosed carefully.

Knowledge Distillation A big, smart model teaches a small one the essentials. Result: a compact model that runs almost as well but far more cheaply.

Quantization Format (GPTQ) A common method to quantize (compress) models after the fact, without retraining them — popular for making large models runnable locally.

DPO (Direct Preference Optimization) A leaner alternative to RLHF: the model learns directly from "answer A is better than B," without the complicated reward-model detour.

KV-Cache (Key-Value Cache) A buffer that keeps already-computed parts of an answer so the model doesn't have to recompute everything as it continues. What makes long answers affordable.

MMLU A well-known test with thousands of questions across 57 subjects, used to compare models' general knowledge. Handy for benchmark marketing, to be taken with a grain of salt.

Mechanistic Interpretability The attempt to look inside a model and understand how it arrives at its answers — AI neuroscience, basically. Still very early days.

EU AI Act The EU's AI law: it sorts AI applications into risk classes and ties obligations to them. The world's first comprehensive AI regulation of its kind.

Deep Sea (for pros)

This is where it gets technical. One sentence per term — if you want to go deeper, the quiz has a Pro level.

Mixture of Experts (MoE) — Instead of one big network, many small "experts," only a few of which activate per request: more knowledge at less compute.

Flash Attention — A trick that makes the attention computation more memory-efficient, enabling longer contexts and faster training.

Speculative Decoding — A small model guesses the next tokens ahead, a large one just checks them — noticeably speeding up text output.

Sparse Attention — The model no longer looks at every word pair, only the relevant ones — saving compute on long texts.

Model Merging / SLERP — Several trained models are fused into one; SLERP is the method for cleanly interpolating their weights along the surface of a sphere.

Continued Pre-Training — A finished model is further pre-trained on large amounts of new raw data — unlike fine-tuning, which only sharpens one specific task.

Prompt Compression — Long prompts are automatically boiled down so they fit the context window and cost fewer tokens.

Superposition — The phenomenon that a single neuron encodes several independent concepts at once — one reason models are so hard to see through.

Dense Passage Retrieval (DPR) — Search via embeddings (meaning) instead of exact keywords — the "dense" variant of finding hits in RAG.

Agentic Workflow — A process in which an AI plans, acts, and corrects itself across several steps, rather than answering just once.

GRPO vs. PPO — Two methods for optimizing models via reward; GRPO drops the separate value model and is thus leaner than the older PPO.

Long-Context Model — A model whose context window holds hundreds of thousands to millions of tokens — whole books at once, with all sorts of technical tricks behind it.

Missing a term? It might already be in an article or the quiz. Otherwise: we keep expanding the glossary.