Local vs Cloud AI — When to Run AI on Your Own Computer

Status: 🟩 COMPLETE 🟦 LIVING Section: decision-frameworks Tags: local-ai, cloud-ai, privacy, open-weights, decision, ollama, hardware


The short answer

For most users: cloud AI is the right choice. Free Claude, ChatGPT, and Gemini give you frontier model quality without infrastructure complexity. Don’t run AI locally unless you have a specific reason.

The specific reasons that make local AI worthwhile:

  • Maximum privacy (genuinely sensitive data; air-gapped requirements)
  • Unlimited use (you’d otherwise pay heavy API costs)
  • Offline use (no internet available)
  • Customisation (fine-tuning models for specific domains)
  • Learning (you’re studying how AI works)

For everyday users, none of these usually apply. For specific professional contexts, all of them might.


What “local AI” means

When you run AI locally, the model files sit on your computer (or your server) and all processing happens there. No data goes to OpenAI, Anthropic, Google, or anyone else. The AI runs on your CPU and/or GPU.

This is enabled by open-weights models — AI models that are released publicly and can be downloaded. See open-weights-vs-closed for the full distinction.

Major open-weights models worth considering:

  • Llama (Meta) — most popular; many sizes
  • Gemma (Google) — efficient; good quality
  • Mistral / Mixtral (Mistral, France) — strong quality + efficiency
  • Phi (Microsoft) — tiny but capable; runs on phones
  • Qwen, DeepSeek ⛔ — capable but Chinese; the encyclopedia recommends Western alternatives

When local AI is the right choice

1. Maximum privacy requirements

Strong fit: You’re processing data so sensitive that it cannot leave your infrastructure under any circumstances.

Examples:

  • Medical research with patient data
  • Legal work with privileged information
  • Government work in classified or sensitive contexts
  • Trade secrets during product development
  • Personal therapy journals that shouldn’t be on third-party servers
  • Financial analysis of confidential data

For these uses, local AI gives you guaranteed data control — no cloud provider sees anything.

2. Genuine cost concerns at scale

Strong fit: You’re using AI heavily enough that API costs are meaningful.

Math:

  • API costs: ~$3-15 per million tokens for frontier models
  • A heavy daily user: 100K tokens/day = ~30-150/month
  • For a team: multiply by team members
  • For a product: multiply by user count

If your AI use is large enough to make $50-500/month in API charges, running open-weights locally on your own hardware may save money — though hardware costs and electricity also matter.

3. Offline / air-gapped environments

Strong fit: You don’t have reliable internet, or you’re working in environments where internet isn’t allowed.

Examples:

  • Remote and rural Australia with patchy connectivity
  • Government secure facilities
  • Defence applications
  • Industrial sites without internet
  • Travel to areas without connectivity

4. Customisation and fine-tuning

Strong fit: You want to train AI on your specific data or domain.

  • Domain-specific terminology (legal jargon, medical terminology, technical specs)
  • Brand voice for content generation
  • Specific style for creative writing
  • Specialised reasoning for narrow domains

Cloud AI doesn’t let you fine-tune freely. Local AI does.

5. Learning how AI works

Strong fit: You’re a developer, researcher, or student wanting to understand AI deeply.

Running models locally teaches you about:

  • Model architecture and parameters
  • Tokenization
  • Inference performance
  • Memory and compute requirements

When cloud AI is the right choice

1. You just want to use AI

For most users, cloud AI:

  • Just works with no setup
  • Has the best models (GPT-5, Claude 4 Opus, Gemini 2.5 Pro)
  • Is cheap or free for normal use
  • Handles all the infrastructure complexity

Don’t make AI harder than it needs to be.

2. You don’t have powerful hardware

Local AI needs:

  • Modern CPU
  • ≥16GB RAM (for small models)
  • ≥32GB RAM (for medium models)
  • Modern GPU or Apple Silicon (for usable speed)
  • Storage for model files (5GB-100GB+)

If you don’t have this, cloud AI is much better.

3. You need cutting-edge capability

The best models (Claude Opus 4, GPT-5, Gemini Ultra) are closed and cloud-only. Open-weights models are good and improving, but still trail the frontier.

For maximum capability: cloud.

4. Your data isn’t that sensitive

The privacy concerns are real but specific. For:

  • Asking general questions
  • Drafting emails to public people
  • Brainstorming ideas
  • Casual research

Cloud AI privacy is fine.

5. You don’t want infrastructure responsibility

Running local AI means:

  • Installing software
  • Managing updates
  • Troubleshooting
  • Backing up
  • Potentially server management for production

If that’s not your interest, cloud AI removes all of this.


What you need to run local AI

Hardware requirements by model size

Model sizeRAMGPUPerformance
1B parameters (Phi-3 mini)4GB+OptionalFast on CPU
3-4B (Llama 3.2 3B, Gemma 2 2B)8GB+OptionalReasonable on CPU
7-8B (Llama 3 8B, Mistral 7B)16GB+HelpfulSlow on CPU; fast on GPU
13-14B (Phi-4 14B)24GB+RecommendedSlow without GPU
27-32B (Gemma 2 27B)32GB+RequiredNeed real GPU
70B+ (Llama 3.3 70B)64GB+Multiple GPUs typicalSerious hardware needed

Apple Silicon advantage: Macs with M1/M2/M3/M4 chips and unified memory are surprisingly good for local AI. A MacBook with 32GB unified memory can run 13B-30B models reasonably well — better than most PCs without dedicated GPUs.

Software for local AI

Easiest tools:

  • Ollama (https://ollama.com) — by far the easiest. Install; pull a model with one command; run.
  • LM Studio (https://lmstudio.ai) — GUI app for non-technical users. Download; pick a model; chat.
  • GPT4All — desktop app; private; simple

Power user tools:

  • llama.cpp — efficient inference; cross-platform
  • vLLM — production-grade server; high throughput
  • Text Generation Web UI — feature-rich UI for power users

Getting started in 10 minutes

  1. Download Ollama from https://ollama.com
  2. Install (works on Mac, Windows, Linux)
  3. Open terminal and run: ollama pull llama3.2:3b (small model; works on most machines)
  4. Run: ollama run llama3.2:3b
  5. Chat: Type a question; get a response

For a larger model, try ollama pull llama3.1:8b if you have 16GB+ RAM.


Quality comparison

How do local models compare to frontier cloud models?

CapabilityCloud frontierLocal 8B modelLocal 70B model
General knowledgeExcellentGoodVery good
Complex reasoningExcellentLimitedGood
CodingExcellentDecentGood
Writing qualityExcellentGoodVery good
Long documentsExcellentLimitedGood
SpeedFast (cloud GPU)Depends on hardwareSlow on consumer hardware
Capability ceilingHighCapable for narrow tasksApproaching frontier

Rule of thumb: Local 8B = good for everyday tasks; local 70B = approaching frontier; cloud = best for hardest tasks.


Cost comparison

For a heavy user (e.g., 500K tokens/day):

Cloud (paid API)

  • ~$30-150/month depending on model
  • No upfront cost
  • Scales with use

Cloud (subscription)

  • $20-200/month flat
  • Capped use (Pro vs Plus matters)
  • Most predictable cost

Local (your hardware)

  • $0/month after setup
  • Hardware cost: 3,000+ (dedicated rig)
  • Electricity: ~$5-30/month for GPU use
  • Time: real cost of setup and maintenance

For occasional use: cloud free tier wins. For heavy use: local can be cost-effective. For everything in between: cloud subscription typically wins.


Privacy comparison

ApproachData privacy
Free cloud (consumer)Provider sees and may use your data
Paid cloud consumerProvider sees; doesn’t train on it
Cloud APIProvider sees briefly; no training
Cloud Enterprise with DPAProvider sees; strong contractual protection
Local AINobody sees but you

For Australian businesses with personal information (APP 8 considerations): local AI fully resolves cross-border disclosure concerns.


The realistic recommendation for most Australians

Most people should use cloud AI because:

  • Free tiers are genuinely excellent
  • Frontier models matter for quality
  • No infrastructure complexity
  • Mobile and multi-device access

Consider local AI when:

  • You have specific privacy requirements
  • You’re a developer learning AI
  • You’d benefit from unlimited use
  • Your hardware is capable
  • You’d enjoy the project

For specific Australian scenarios:

  • Medical practitioner with patient data: Strong case for local AI for confidential clinical analysis
  • Legal practitioner with privileged work: Worth investigating local options
  • Sole trader doing routine work: Cloud is fine
  • Government department with sensitivity: Often needs local/sovereign options
  • Researcher with confidential data: Local AI for analysis; cloud for everything else

Combining cloud and local

Many sophisticated users use both:

  • Cloud for daily tasks (writing, research, conversation)
  • Local for specific sensitive work (confidential analysis, custom workflows)
  • Cloud for cutting-edge capability when needed
  • Local for unlimited iteration during development

This hybrid approach maximises capability while preserving control where it matters.


See also


Sources

  • Personal experience running local AI on Mac and PC hardware (2023-2026)
  • Ollama documentation: ollama.com
  • LM Studio documentation: lmstudio.ai
  • Independent benchmarks of open-weights models (2024-2026)
  • Hardware reviews relevant to AI inference (AnandTech, Tom’s Hardware)