Local vs Cloud AI — When to Run AI on Your Own Computer

Status: 🟩 COMPLETE 🟦 LIVING Section: decision-frameworks Tags: local-ai, cloud-ai, privacy, open-weights, decision, ollama, hardware

The short answer

For most users: cloud AI is the right choice. Free Claude, ChatGPT, and Gemini give you frontier model quality without infrastructure complexity. Don’t run AI locally unless you have a specific reason.

The specific reasons that make local AI worthwhile:

Maximum privacy (genuinely sensitive data; air-gapped requirements)
Unlimited use (you’d otherwise pay heavy API costs)
Offline use (no internet available)
Customisation (fine-tuning models for specific domains)
Learning (you’re studying how AI works)

For everyday users, none of these usually apply. For specific professional contexts, all of them might.

What “local AI” means

When you run AI locally, the model files sit on your computer (or your server) and all processing happens there. No data goes to OpenAI, Anthropic, Google, or anyone else. The AI runs on your CPU and/or GPU.

This is enabled by open-weights models — AI models that are released publicly and can be downloaded. See open-weights-vs-closed for the full distinction.

Major open-weights models worth considering:

Llama (Meta) — most popular; many sizes
Gemma (Google) — efficient; good quality
Mistral / Mixtral (Mistral, France) — strong quality + efficiency
Phi (Microsoft) — tiny but capable; runs on phones
Qwen, DeepSeek ⛔ — capable but Chinese; the encyclopedia recommends Western alternatives

When local AI is the right choice

1. Maximum privacy requirements

Strong fit: You’re processing data so sensitive that it cannot leave your infrastructure under any circumstances.

Examples:

Medical research with patient data
Legal work with privileged information
Government work in classified or sensitive contexts
Trade secrets during product development
Personal therapy journals that shouldn’t be on third-party servers
Financial analysis of confidential data

For these uses, local AI gives you guaranteed data control — no cloud provider sees anything.

2. Genuine cost concerns at scale

Strong fit: You’re using AI heavily enough that API costs are meaningful.

Math:

API costs: ~$3-15 per million tokens for frontier models
A heavy daily user: 100K tokens/day = ~ $1 - 5/ d a y =$ 30-150/month
For a team: multiply by team members
For a product: multiply by user count

If your AI use is large enough to make $50-500/month in API charges, running open-weights locally on your own hardware may save money — though hardware costs and electricity also matter.

3. Offline / air-gapped environments

Strong fit: You don’t have reliable internet, or you’re working in environments where internet isn’t allowed.

Examples:

Remote and rural Australia with patchy connectivity
Government secure facilities
Defence applications
Industrial sites without internet
Travel to areas without connectivity

4. Customisation and fine-tuning

Strong fit: You want to train AI on your specific data or domain.

Domain-specific terminology (legal jargon, medical terminology, technical specs)
Brand voice for content generation
Specific style for creative writing
Specialised reasoning for narrow domains

Cloud AI doesn’t let you fine-tune freely. Local AI does.

5. Learning how AI works

Strong fit: You’re a developer, researcher, or student wanting to understand AI deeply.

Running models locally teaches you about:

Model architecture and parameters
Tokenization
Inference performance
Memory and compute requirements

When cloud AI is the right choice

1. You just want to use AI

For most users, cloud AI:

Just works with no setup
Has the best models (GPT-5, Claude 4 Opus, Gemini 2.5 Pro)
Is cheap or free for normal use
Handles all the infrastructure complexity

Don’t make AI harder than it needs to be.

2. You don’t have powerful hardware

Local AI needs:

Modern CPU
≥16GB RAM (for small models)
≥32GB RAM (for medium models)
Modern GPU or Apple Silicon (for usable speed)
Storage for model files (5GB-100GB+)

If you don’t have this, cloud AI is much better.

3. You need cutting-edge capability

The best models (Claude Opus 4, GPT-5, Gemini Ultra) are closed and cloud-only. Open-weights models are good and improving, but still trail the frontier.

For maximum capability: cloud.

4. Your data isn’t that sensitive

The privacy concerns are real but specific. For:

Asking general questions
Drafting emails to public people
Brainstorming ideas
Casual research

Cloud AI privacy is fine.

5. You don’t want infrastructure responsibility

Running local AI means:

Installing software
Managing updates
Troubleshooting
Backing up
Potentially server management for production

If that’s not your interest, cloud AI removes all of this.

What you need to run local AI

Hardware requirements by model size

Model size	RAM	GPU	Performance
1B parameters (Phi-3 mini)	4GB+	Optional	Fast on CPU
3-4B (Llama 3.2 3B, Gemma 2 2B)	8GB+	Optional	Reasonable on CPU
7-8B (Llama 3 8B, Mistral 7B)	16GB+	Helpful	Slow on CPU; fast on GPU
13-14B (Phi-4 14B)	24GB+	Recommended	Slow without GPU
27-32B (Gemma 2 27B)	32GB+	Required	Need real GPU
70B+ (Llama 3.3 70B)	64GB+	Multiple GPUs typical	Serious hardware needed

Apple Silicon advantage: Macs with M1/M2/M3/M4 chips and unified memory are surprisingly good for local AI. A MacBook with 32GB unified memory can run 13B-30B models reasonably well — better than most PCs without dedicated GPUs.

Software for local AI

Easiest tools:

Ollama (https://ollama.com) — by far the easiest. Install; pull a model with one command; run.
LM Studio (https://lmstudio.ai) — GUI app for non-technical users. Download; pick a model; chat.
GPT4All — desktop app; private; simple

Power user tools:

llama.cpp — efficient inference; cross-platform
vLLM — production-grade server; high throughput
Text Generation Web UI — feature-rich UI for power users

Getting started in 10 minutes

Download Ollama from https://ollama.com
Install (works on Mac, Windows, Linux)
Open terminal and run: ollama pull llama3.2:3b (small model; works on most machines)
Run: ollama run llama3.2:3b
Chat: Type a question; get a response

For a larger model, try ollama pull llama3.1:8b if you have 16GB+ RAM.

Quality comparison

How do local models compare to frontier cloud models?

Capability	Cloud frontier	Local 8B model	Local 70B model
General knowledge	Excellent	Good	Very good
Complex reasoning	Excellent	Limited	Good
Coding	Excellent	Decent	Good
Writing quality	Excellent	Good	Very good
Long documents	Excellent	Limited	Good
Speed	Fast (cloud GPU)	Depends on hardware	Slow on consumer hardware
Capability ceiling	High	Capable for narrow tasks	Approaching frontier

Rule of thumb: Local 8B = good for everyday tasks; local 70B = approaching frontier; cloud = best for hardest tasks.

Cost comparison

For a heavy user (e.g., 500K tokens/day):

Cloud (paid API)

~$30-150/month depending on model
No upfront cost
Scales with use

Cloud (subscription)

$20-200/month flat
Capped use (Pro vs Plus matters)
Most predictable cost

Local (your hardware)

$0/month after setup
Hardware cost: $0 (e x i s t in g ma c hin e) t o$ 3,000+ (dedicated rig)
Electricity: ~$5-30/month for GPU use
Time: real cost of setup and maintenance

For occasional use: cloud free tier wins. For heavy use: local can be cost-effective. For everything in between: cloud subscription typically wins.

Privacy comparison

Approach	Data privacy
Free cloud (consumer)	Provider sees and may use your data
Paid cloud consumer	Provider sees; doesn’t train on it
Cloud API	Provider sees briefly; no training
Cloud Enterprise with DPA	Provider sees; strong contractual protection
Local AI	Nobody sees but you

For Australian businesses with personal information (APP 8 considerations): local AI fully resolves cross-border disclosure concerns.

The realistic recommendation for most Australians

Most people should use cloud AI because:

Free tiers are genuinely excellent
Frontier models matter for quality
No infrastructure complexity
Mobile and multi-device access

Consider local AI when:

You have specific privacy requirements
You’re a developer learning AI
You’d benefit from unlimited use
Your hardware is capable
You’d enjoy the project

For specific Australian scenarios:

Medical practitioner with patient data: Strong case for local AI for confidential clinical analysis
Legal practitioner with privileged work: Worth investigating local options
Sole trader doing routine work: Cloud is fine
Government department with sensitivity: Often needs local/sovereign options
Researcher with confidential data: Local AI for analysis; cloud for everything else

Combining cloud and local

Many sophisticated users use both:

Cloud for daily tasks (writing, research, conversation)
Local for specific sensitive work (confidential analysis, custom workflows)
Cloud for cutting-edge capability when needed
Local for unlimited iteration during development

This hybrid approach maximises capability while preserving control where it matters.

Sources

Personal experience running local AI on Mac and PC hardware (2023-2026)
Ollama documentation: ollama.com
LM Studio documentation: lmstudio.ai
Independent benchmarks of open-weights models (2024-2026)
Hardware reviews relevant to AI inference (AnandTech, Tom’s Hardware)

Tech & AI, Explained

Explorer

local-vs-cloud-ai