What if your AI queries went nowhere — except your own hard drive? No server overseas, no company reading along, no terms of service you have to squint past. That's exactly what Ollama promises.
What is Ollama?
Ollama is a tool that runs large language models locally on your computer. You install it, download a model, and the entire AI runs on your own hardware. Fully offline. Fully under your control.
No account. No subscription. No "your data helps us improve the service."
What do you actually need?
Here's the most important clarification: RAM is the bottleneck, not the GPU.
AI models get loaded entirely into memory. If the model doesn't fit in RAM, it either won't run at all or will be painfully slow. As a rough guide:
- 8 GB RAM: Smaller models like Phi-3 Mini or Gemma 2B run fine
- 16 GB RAM: Llama 3.2 (3B or 8B) runs comfortably
- 32 GB RAM: Llama 3 70B and similar become possible
A dedicated GPU is a bonus — it speeds things up considerably. But Ollama also runs on CPU alone. It's slower, sure, but for occasional use it's often perfectly usable.
Which models run on normal hardware?
Ollama has its own model library at ollama.com/library. Some recommendations for everyday hardware:
- Llama 3.2 3B — Meta's model, fast, surprisingly capable for its size
- Gemma 3 4B — Google's compact model, solid for everyday tasks
- Phi-3 Mini — Microsoft's small powerhouse, surprisingly good at reasoning
- Mistral 7B — A bit larger, but comfortable on 16 GB systems
- Qwen 2.5 Coder — Specifically optimized for code tasks
For comparison: GPT-4 likely has around a trillion parameters. Llama 3.2 3B has — surprise — 3 billion. The quality difference is real, but for a lot of tasks the smaller model gets the job done just fine.
Installation in three minutes
- Go to ollama.com and download the installer for your OS (Windows, Mac, Linux — all covered)
- Install and launch
- In the terminal:
ollama run llama3.2— the model downloads and starts
Then you can chat directly in the terminal. Or you install a frontend.
OpenWebUI: Because it should look decent too
The terminal interface works, but it's not exactly welcoming. OpenWebUI is a browser-based frontend that connects to Ollama and looks like a proper chat interface — similar to ChatGPT, but running locally.
I run both myself: Ollama in the background, OpenWebUI as the interface. It makes daily use much more pleasant.
Installation with Docker is a one-liner — if you have Docker, it takes two minutes. For everyone else there are other ways, but that starts to get into tinkerer territory.
What's the catch?
It would be unfair not to mention it:
Slower: Local models on normal hardware are noticeably slower than cloud AI. Not unbearable, but you're watching it think.
Smaller: The quality of locally running models doesn't reach GPT-4o or Claude Sonnet levels. For simple tasks it doesn't matter; for complex analysis you'll feel the difference.
More effort: This isn't "create account and go." You need a bit of technical curiosity.
Who is Ollama for?
Anyone who cares about privacy and is willing to trade a bit of convenience for it. Tinkerers who like understanding what's happening under the hood. Developers who want to test models locally.
And anyone who's just curious: what does it actually feel like when the AI runs on your own machine?
Next week: An honest meta-post — how this blog itself was built with Claude Code.
