Local vs Cloud AI — When to Run AI on Your Own Computer
Status: 🟩 COMPLETE 🟦 LIVING Section: decision-frameworks Tags: local-ai, cloud-ai, privacy, open-weights, decision, ollama, hardware
The short answer
For most users: cloud AI is the right choice. Free Claude, ChatGPT, and Gemini give you frontier model quality without infrastructure complexity. Don’t run AI locally unless you have a specific reason.
The specific reasons that make local AI worthwhile:
- Maximum privacy (genuinely sensitive data; air-gapped requirements)
- Unlimited use (you’d otherwise pay heavy API costs)
- Offline use (no internet available)
- Customisation (fine-tuning models for specific domains)
- Learning (you’re studying how AI works)
For everyday users, none of these usually apply. For specific professional contexts, all of them might.
What “local AI” means
When you run AI locally, the model files sit on your computer (or your server) and all processing happens there. No data goes to OpenAI, Anthropic, Google, or anyone else. The AI runs on your CPU and/or GPU.
This is enabled by open-weights models — AI models that are released publicly and can be downloaded. See open-weights-vs-closed for the full distinction.
Major open-weights models worth considering:
- Llama (Meta) — most popular; many sizes
- Gemma (Google) — efficient; good quality
- Mistral / Mixtral (Mistral, France) — strong quality + efficiency
- Phi (Microsoft) — tiny but capable; runs on phones
- Qwen, DeepSeek ⛔ — capable but Chinese; the encyclopedia recommends Western alternatives
When local AI is the right choice
1. Maximum privacy requirements
Strong fit: You’re processing data so sensitive that it cannot leave your infrastructure under any circumstances.
Examples:
- Medical research with patient data
- Legal work with privileged information
- Government work in classified or sensitive contexts
- Trade secrets during product development
- Personal therapy journals that shouldn’t be on third-party servers
- Financial analysis of confidential data
For these uses, local AI gives you guaranteed data control — no cloud provider sees anything.
2. Genuine cost concerns at scale
Strong fit: You’re using AI heavily enough that API costs are meaningful.
Math:
- API costs: ~$3-15 per million tokens for frontier models
- A heavy daily user: 100K tokens/day = ~30-150/month
- For a team: multiply by team members
- For a product: multiply by user count
If your AI use is large enough to make $50-500/month in API charges, running open-weights locally on your own hardware may save money — though hardware costs and electricity also matter.
3. Offline / air-gapped environments
Strong fit: You don’t have reliable internet, or you’re working in environments where internet isn’t allowed.
Examples:
- Remote and rural Australia with patchy connectivity
- Government secure facilities
- Defence applications
- Industrial sites without internet
- Travel to areas without connectivity
4. Customisation and fine-tuning
Strong fit: You want to train AI on your specific data or domain.
- Domain-specific terminology (legal jargon, medical terminology, technical specs)
- Brand voice for content generation
- Specific style for creative writing
- Specialised reasoning for narrow domains
Cloud AI doesn’t let you fine-tune freely. Local AI does.
5. Learning how AI works
Strong fit: You’re a developer, researcher, or student wanting to understand AI deeply.
Running models locally teaches you about:
- Model architecture and parameters
- Tokenization
- Inference performance
- Memory and compute requirements
When cloud AI is the right choice
1. You just want to use AI
For most users, cloud AI:
- Just works with no setup
- Has the best models (GPT-5, Claude 4 Opus, Gemini 2.5 Pro)
- Is cheap or free for normal use
- Handles all the infrastructure complexity
Don’t make AI harder than it needs to be.
2. You don’t have powerful hardware
Local AI needs:
- Modern CPU
- ≥16GB RAM (for small models)
- ≥32GB RAM (for medium models)
- Modern GPU or Apple Silicon (for usable speed)
- Storage for model files (5GB-100GB+)
If you don’t have this, cloud AI is much better.
3. You need cutting-edge capability
The best models (Claude Opus 4, GPT-5, Gemini Ultra) are closed and cloud-only. Open-weights models are good and improving, but still trail the frontier.
For maximum capability: cloud.
4. Your data isn’t that sensitive
The privacy concerns are real but specific. For:
- Asking general questions
- Drafting emails to public people
- Brainstorming ideas
- Casual research
Cloud AI privacy is fine.
5. You don’t want infrastructure responsibility
Running local AI means:
- Installing software
- Managing updates
- Troubleshooting
- Backing up
- Potentially server management for production
If that’s not your interest, cloud AI removes all of this.
What you need to run local AI
Hardware requirements by model size
| Model size | RAM | GPU | Performance |
|---|---|---|---|
| 1B parameters (Phi-3 mini) | 4GB+ | Optional | Fast on CPU |
| 3-4B (Llama 3.2 3B, Gemma 2 2B) | 8GB+ | Optional | Reasonable on CPU |
| 7-8B (Llama 3 8B, Mistral 7B) | 16GB+ | Helpful | Slow on CPU; fast on GPU |
| 13-14B (Phi-4 14B) | 24GB+ | Recommended | Slow without GPU |
| 27-32B (Gemma 2 27B) | 32GB+ | Required | Need real GPU |
| 70B+ (Llama 3.3 70B) | 64GB+ | Multiple GPUs typical | Serious hardware needed |
Apple Silicon advantage: Macs with M1/M2/M3/M4 chips and unified memory are surprisingly good for local AI. A MacBook with 32GB unified memory can run 13B-30B models reasonably well — better than most PCs without dedicated GPUs.
Software for local AI
Easiest tools:
- Ollama (https://ollama.com) — by far the easiest. Install; pull a model with one command; run.
- LM Studio (https://lmstudio.ai) — GUI app for non-technical users. Download; pick a model; chat.
- GPT4All — desktop app; private; simple
Power user tools:
- llama.cpp — efficient inference; cross-platform
- vLLM — production-grade server; high throughput
- Text Generation Web UI — feature-rich UI for power users
Getting started in 10 minutes
- Download Ollama from https://ollama.com
- Install (works on Mac, Windows, Linux)
- Open terminal and run:
ollama pull llama3.2:3b(small model; works on most machines) - Run:
ollama run llama3.2:3b - Chat: Type a question; get a response
For a larger model, try ollama pull llama3.1:8b if you have 16GB+ RAM.
Quality comparison
How do local models compare to frontier cloud models?
| Capability | Cloud frontier | Local 8B model | Local 70B model |
|---|---|---|---|
| General knowledge | Excellent | Good | Very good |
| Complex reasoning | Excellent | Limited | Good |
| Coding | Excellent | Decent | Good |
| Writing quality | Excellent | Good | Very good |
| Long documents | Excellent | Limited | Good |
| Speed | Fast (cloud GPU) | Depends on hardware | Slow on consumer hardware |
| Capability ceiling | High | Capable for narrow tasks | Approaching frontier |
Rule of thumb: Local 8B = good for everyday tasks; local 70B = approaching frontier; cloud = best for hardest tasks.
Cost comparison
For a heavy user (e.g., 500K tokens/day):
Cloud (paid API)
- ~$30-150/month depending on model
- No upfront cost
- Scales with use
Cloud (subscription)
- $20-200/month flat
- Capped use (Pro vs Plus matters)
- Most predictable cost
Local (your hardware)
- $0/month after setup
- Hardware cost: 3,000+ (dedicated rig)
- Electricity: ~$5-30/month for GPU use
- Time: real cost of setup and maintenance
For occasional use: cloud free tier wins. For heavy use: local can be cost-effective. For everything in between: cloud subscription typically wins.
Privacy comparison
| Approach | Data privacy |
|---|---|
| Free cloud (consumer) | Provider sees and may use your data |
| Paid cloud consumer | Provider sees; doesn’t train on it |
| Cloud API | Provider sees briefly; no training |
| Cloud Enterprise with DPA | Provider sees; strong contractual protection |
| Local AI | Nobody sees but you |
For Australian businesses with personal information (APP 8 considerations): local AI fully resolves cross-border disclosure concerns.
The realistic recommendation for most Australians
Most people should use cloud AI because:
- Free tiers are genuinely excellent
- Frontier models matter for quality
- No infrastructure complexity
- Mobile and multi-device access
Consider local AI when:
- You have specific privacy requirements
- You’re a developer learning AI
- You’d benefit from unlimited use
- Your hardware is capable
- You’d enjoy the project
For specific Australian scenarios:
- Medical practitioner with patient data: Strong case for local AI for confidential clinical analysis
- Legal practitioner with privileged work: Worth investigating local options
- Sole trader doing routine work: Cloud is fine
- Government department with sensitivity: Often needs local/sovereign options
- Researcher with confidential data: Local AI for analysis; cloud for everything else
Combining cloud and local
Many sophisticated users use both:
- Cloud for daily tasks (writing, research, conversation)
- Local for specific sensitive work (confidential analysis, custom workflows)
- Cloud for cutting-edge capability when needed
- Local for unlimited iteration during development
This hybrid approach maximises capability while preserving control where it matters.
See also
- open-weights-vs-closed — the underlying model distinction
- ai-hardware-overview — what hardware to consider
- llama — most popular open-weights models
- gemma — Google’s open-weights
- mistral-company — European open-weights leader
- australian-privacy-considerations
- claude-vs-chatgpt-vs-gemini
Sources
- Personal experience running local AI on Mac and PC hardware (2023-2026)
- Ollama documentation: ollama.com
- LM Studio documentation: lmstudio.ai
- Independent benchmarks of open-weights models (2024-2026)
- Hardware reviews relevant to AI inference (AnandTech, Tom’s Hardware)