How to Set Up Ollama — Run AI Models Locally on Your Computer
Status: 🟩 COMPLETE 🟦 LIVING Section: how-to Tags: ollama, local-ai, privacy, open-weights, llama, setup, walkthrough
What you’re doing
Ollama is the easiest way to run AI models on your own computer — no cloud, no API keys, no data leaving your machine. Once set up, you can run Llama, Gemma, Mistral, Phi, and many other open-weights models locally for free.
This is genuinely useful when you want:
- Maximum privacy (data never leaves your computer)
- Unlimited use (no per-token costs)
- Offline AI
- To learn how AI actually works
Time: 15-30 minutes including downloading a model.
What you need
- A modern computer (Mac, Windows, or Linux)
- Mac: 8GB+ RAM (16GB+ recommended); Apple Silicon (M1/M2/M3/M4) strongly preferred
- PC: 16GB+ RAM (32GB+ for larger models); GPU helpful but not required
- About 10GB free disk space (more for multiple models)
- Internet connection for initial download
Step-by-step
Step 1 — Download Ollama
Go to https://ollama.com.
Click Download and choose your operating system:
- macOS
- Windows
- Linux
Install like any other application.
Step 2 — Open Terminal
You’ll interact with Ollama through the terminal/command line. Don’t panic — it’s just a few commands.
Mac: Open Terminal app (Spotlight: type “Terminal”)
Windows: Open PowerShell or Command Prompt (Win+R, type “powershell” or “cmd”)
Linux: You know how to open terminal
Step 3 — Pull your first model
In the terminal, type:
ollama pull llama3.2:3bPress Enter. Ollama downloads the model (a few GB). Wait for it to finish.
Why this model?
- 3 billion parameters (small enough to run on most computers)
- Good quality for everyday tasks
- Fast on most hardware
- Llama 3.2 from Meta
Step 4 — Chat with the model
In terminal:
ollama run llama3.2:3bYou’ll see a prompt. Type a question:
>>> What's the capital of Australia?
Press Enter. The model thinks for a moment and responds.
Type /bye to exit.
Congratulations — you just ran an AI model on your own computer with zero data leaving your machine.
Models worth trying
After Llama 3.2:3b, try others:
Small (works on most computers)
ollama pull phi3:mini # Microsoft's Phi-3 Mini
ollama pull gemma2:2b # Google's Gemma 2 2B
ollama pull tinyllama # Very small; testingMedium (16GB+ RAM recommended)
ollama pull llama3.1:8b # Llama 3.1 8B
ollama pull mistral # Mistral 7B
ollama pull gemma2:9b # Gemma 2 9B
ollama pull phi3:medium # Phi-3 14BLarge (32GB+ RAM recommended)
ollama pull llama3.3:70b # Llama 3.3 70B (very capable)
ollama pull gemma2:27b # Gemma 2 27B
ollama pull mixtral # Mixtral 8x7BSpecialised
ollama pull codellama # Code-focused
ollama pull llava # Vision-capable (sees images)
ollama pull nomic-embed-text # Embeddings for searchRun any model
Once pulled, run with:
ollama run <model-name>For example:
ollama run mistral
ollama run gemma2:9bUseful commands
# List models you have
ollama list
# Show model info
ollama show llama3.2:3b
# Remove a model (frees disk space)
ollama rm modelname
# Update Ollama itself
# (Mac: app auto-updates; Linux: run install script again)
# Stop a running model
# (Just exit the chat with /bye or Ctrl+C)Use Ollama from your own code
Ollama runs a local API server. You can call it from Python, JavaScript, or any language:
Python
import requests
response = requests.post('http://localhost:11434/api/generate',
json={'model': 'llama3.2:3b', 'prompt': 'Hello!', 'stream': False})
print(response.json()['response'])Using OpenAI Python SDK (Ollama is OpenAI API-compatible)
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1',
api_key='ollama' # required but unused
)
response = client.chat.completions.create(
model='llama3.2:3b',
messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response.choices[0].message.content)This means you can develop AI applications using local models for free — useful for testing and privacy-sensitive work.
GUI option: LM Studio or Open WebUI
If you don’t love the terminal, try GUI tools:
LM Studio
- Download: https://lmstudio.ai
- Visual app for downloading and chatting with models
- Easier for non-technical users
- Works alongside or instead of Ollama
Open WebUI
- Browser interface for Ollama
- Install:
docker runcommand on their docs - Looks like ChatGPT
- Great for ongoing use
Jan.ai
- Another GUI alternative
- https://jan.ai
- Cross-platform
Apple Silicon advantage
If you have a Mac with M1/M2/M3/M4 chip:
Why it matters: Apple Silicon has “unified memory” — RAM is shared between CPU and GPU. This means a Mac with 32GB unified RAM can run 30B parameter models, whereas a PC needs a 24GB+ VRAM GPU (which costs $1000+).
Quick benchmarks (approximate, may vary):
| Mac | Best model speed |
|---|---|
| M1 8GB | 3B–7B models OK |
| M2/M3 16GB | 8B–13B models good |
| M3 Pro 32GB | 30B models reasonable |
| M3 Max 64GB+ | 70B models possible |
| Apple Studio | Frontier-size models possible |
For local AI on a budget, modern Macs are surprisingly capable.
Hardware recommendations
To run small models (3B-7B)
- Any modern computer with 8GB+ RAM
- No GPU needed
- Will be slow on older hardware
To run medium models (13B-30B)
- 16-32GB RAM
- Discrete GPU helpful on PC
- Apple Silicon Mac excellent option
To run large models (70B+)
- 64GB+ RAM
- Multiple GPUs typical on PC
- Apple Silicon Mac with 64GB+ unified memory
- Or: don’t run locally; use cloud API
Privacy benefits
When you run AI with Ollama:
- No data leaves your computer
- No API key needed
- No usage logging by AI provider
- Works offline
- No cost per use
For sensitive work (confidential business documents, personal journaling, medical questions, legal matters), local AI is the privacy gold standard.
Limitations vs cloud AI
Local AI is not equivalent to cloud frontier models:
Local model limitations:
- Smaller models = less capable
- Even Llama 3.3 70B (best open-weights) is below Claude Opus or GPT-4o
- No native vision in most models (but Llava etc. exist)
- No real-time web search
- No image generation
- No voice mode
When to use cloud AI instead:
- Need frontier capability
- Need current information (web search)
- Need image/video/voice generation
- One-off tasks where setup time isn’t worth it
When local AI is better:
- Sensitive data
- High volume (cost would be significant in cloud)
- Offline use
- Customisation/fine-tuning
- Learning how AI works
Australian-specific notes
- Genuinely useful for Australian users with privacy concerns
- Data sovereignty: your data stays in Australia (because it stays on your machine)
- For Australian businesses with APP 8 considerations: local AI avoids cross-border disclosure
- Particularly useful for: clinicians, lawyers, accountants handling sensitive data
- Internet connection used only for initial model download
Recommended starter setup
For a typical Australian user wanting to try local AI:
- Install Ollama (10 minutes)
- Pull Llama 3.2 3B (~2GB; runs on most hardware)
- Try it for a week — see if it meets your needs
- If it works: explore larger models if hardware allows
- If you want a GUI: add LM Studio
- For ongoing serious use: consider Open WebUI
Common gotchas
- Disk space adds up. Each model is 2-40GB. Manage models you actually use.
- Models won’t run if RAM insufficient. You’ll get an error or extremely slow performance.
- Apple Silicon is significantly faster than equivalent Intel Mac. Newer is better.
- Quality below cloud AI is real. Don’t expect Claude Opus from your laptop.
- First-time download is large but subsequent runs are local-only.
- Battery life on laptops is impacted by AI use — use plugged in for heavy work.
See also
- open-weights-vs-closed — concept of local AI
- local-vs-cloud-ai — when each is right
- llama — Meta’s open-weights model
- gemma — Google’s open-weights model
- mistral-company — European open-weights
- ai-hardware-overview — hardware context
- ai-energy-footprint — power use considerations
Sources
- Ollama documentation: ollama.com
- Tested setup (June 2026)
- Open-weights model documentation
- Personal experience running local AI on Mac and PC hardware