Open AI models: free to use, not free to run

"Open source" sounds like freedom and a price tag of zero. With AI models that's true — half of it. Open models are a huge deal, often surprisingly good, and yet you might still end up paying. Contradictory? It is. Let's untangle it.

What does "open" even mean?

With an open AI model, the weights are made publicly available for download — the billions of numbers that make up the model, basically its "brain". Anyone can grab them, use them, adapt them. Usually under a license like Apache 2.0 or MIT that even permits commercial use — royalty-free.

The heavyweights of 2026: Meta's Llama 4, DeepSeek V4, Alibaba's Qwen 3.5, Google's Gemma 4, Mistral out of France. Many of them now play in the same league as the closed models from OpenAI or Anthropic.

One important catch: "open weights" is not the same as "open source" in the classic sense. You get the finished model, but rarely the full training data or the recipe behind it. You're allowed to use it — you just can't reproduce it from scratch.

How good are they really?

Surprisingly good. On neutral leaderboards the best open model (currently Kimi K2 from Moonshot) sits at number one among the open ones — and in the overall top five, right behind the closed leaders. On coding, DeepSeek V4 even draws level.

Honest take: the very top spots are still held by the closed models from OpenAI, Google and Anthropic. But the gap is small — and shrinking. For the vast majority of everyday tasks you won't notice a difference. (As of June 2026.)

Why won't it run on my laptop?

Here's the reality check. A modern top-tier model has hundreds of billions of parameters (the adjustable values inside the model). DeepSeek V4 sits around the one-trillion mark. All those numbers have to fit into memory — ideally the fast video memory (VRAM) of a beefy graphics card.

Rule of thumb: a model needs roughly as many gigabytes of memory as it has billions of parameters. So a 70-billion model wants a comfortable 40 to 70 GB. Your laptop has maybe 16 GB of RAM and a graphics card with 8. It doesn't fit — not even close.

What does run: small models. Gemma in its 2-to-4-billion variant, a compact Qwen, a Phi. They're handy for plenty of everyday tasks — just not the brain that makes the headlines. How to start the small ones locally is covered in the Ollama article.

So why does the API cost money?

This is exactly where the misconception cracks. The model is free. Running it is not.

When you use an open model through an API (a programming interface your device talks to on the server) — say at providers like DeepInfra, Together or Fireworks — the thing runs on their servers. And those servers are packed with graphics cards like the Nvidia H100, which rent for roughly 2 to 4 dollars an hour. Per card. A large model often needs four of them at once.

On top of that: electricity, cooling, maintenance, and the people keeping it all alive. Someone pays that bill — and in the end that's you, per processed token (roughly: per word fragment).

The good news: it's cheap anyway. Small models start around 0.06 dollars per million tokens, large ones hover around 2 dollars. For normal use you're talking cents, not euros. (All prices as of June 2026 — the market moves fast.)

What does this mean for me?

Run it yourself for free? Doable — but only with small models, and you need enough RAM. Great for privacy, limited on power.
Use a big open model? Via API, for money. Often much cheaper than a ChatGPT subscription, but you have to sort out the access.
For whom? Local: for tinkerers and privacy fans. API: for anyone who wants a powerful model without their own hardware — no coding skills required, through services like OpenRouter.

Open models are fantastic. They're just not free in the sense of "never costs anything". They're free in the sense of "anyone may use them". Two different things — and that exact difference decides whether you end up paying for electricity or an API bill.