How to Set Up Ollama — Run AI Models Locally on Your Computer

Status: 🟩 COMPLETE 🟦 LIVING Section: how-to Tags: ollama, local-ai, privacy, open-weights, llama, setup, walkthrough

What you’re doing

Ollama is the easiest way to run AI models on your own computer — no cloud, no API keys, no data leaving your machine. Once set up, you can run Llama, Gemma, Mistral, Phi, and many other open-weights models locally for free.

This is genuinely useful when you want:

Maximum privacy (data never leaves your computer)
Unlimited use (no per-token costs)
Offline AI
To learn how AI actually works

Time: 15-30 minutes including downloading a model.

What you need

A modern computer (Mac, Windows, or Linux)
Mac: 8GB+ RAM (16GB+ recommended); Apple Silicon (M1/M2/M3/M4) strongly preferred
PC: 16GB+ RAM (32GB+ for larger models); GPU helpful but not required
About 10GB free disk space (more for multiple models)
Internet connection for initial download

Step-by-step

Step 1 — Download Ollama

Go to https://ollama.com.

Click Download and choose your operating system:

macOS
Windows
Linux

Install like any other application.

Step 2 — Open Terminal

You’ll interact with Ollama through the terminal/command line. Don’t panic — it’s just a few commands.

Mac: Open Terminal app (Spotlight: type “Terminal”)

Windows: Open PowerShell or Command Prompt (Win+R, type “powershell” or “cmd”)

Linux: You know how to open terminal

Step 3 — Pull your first model

In the terminal, type:

ollama pull llama3.2:3b

Press Enter. Ollama downloads the model (a few GB). Wait for it to finish.

Why this model?

3 billion parameters (small enough to run on most computers)
Good quality for everyday tasks
Fast on most hardware
Llama 3.2 from Meta

Step 4 — Chat with the model

In terminal:

ollama run llama3.2:3b

You’ll see a prompt. Type a question:

>>> What's the capital of Australia?

Press Enter. The model thinks for a moment and responds.

Type /bye to exit.

Congratulations — you just ran an AI model on your own computer with zero data leaving your machine.

Models worth trying

After Llama 3.2:3b, try others:

Small (works on most computers)

ollama pull phi3:mini       # Microsoft's Phi-3 Mini
ollama pull gemma2:2b       # Google's Gemma 2 2B
ollama pull tinyllama       # Very small; testing

Medium (16GB+ RAM recommended)

ollama pull llama3.1:8b     # Llama 3.1 8B
ollama pull mistral         # Mistral 7B
ollama pull gemma2:9b       # Gemma 2 9B
ollama pull phi3:medium     # Phi-3 14B

Large (32GB+ RAM recommended)

ollama pull llama3.3:70b    # Llama 3.3 70B (very capable)
ollama pull gemma2:27b      # Gemma 2 27B
ollama pull mixtral         # Mixtral 8x7B

Specialised

ollama pull codellama       # Code-focused
ollama pull llava           # Vision-capable (sees images)
ollama pull nomic-embed-text # Embeddings for search

Run any model

Once pulled, run with:

ollama run <model-name>

For example:

ollama run mistral
ollama run gemma2:9b

Useful commands

# List models you have
ollama list
 
# Show model info
ollama show llama3.2:3b
 
# Remove a model (frees disk space)
ollama rm modelname
 
# Update Ollama itself
# (Mac: app auto-updates; Linux: run install script again)
 
# Stop a running model
# (Just exit the chat with /bye or Ctrl+C)

Use Ollama from your own code

Ollama runs a local API server. You can call it from Python, JavaScript, or any language:

Python

import requests
response = requests.post('http://localhost:11434/api/generate',
    json={'model': 'llama3.2:3b', 'prompt': 'Hello!', 'stream': False})
print(response.json()['response'])

Using OpenAI Python SDK (Ollama is OpenAI API-compatible)

from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # required but unused
)
response = client.chat.completions.create(
    model='llama3.2:3b',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response.choices[0].message.content)

This means you can develop AI applications using local models for free — useful for testing and privacy-sensitive work.

GUI option: LM Studio or Open WebUI

If you don’t love the terminal, try GUI tools:

LM Studio

Download: https://lmstudio.ai
Visual app for downloading and chatting with models
Easier for non-technical users
Works alongside or instead of Ollama

Open WebUI

Browser interface for Ollama
Install: docker run command on their docs
Looks like ChatGPT
Great for ongoing use

Jan.ai

Another GUI alternative
https://jan.ai
Cross-platform

Apple Silicon advantage

If you have a Mac with M1/M2/M3/M4 chip:

Why it matters: Apple Silicon has “unified memory” — RAM is shared between CPU and GPU. This means a Mac with 32GB unified RAM can run 30B parameter models, whereas a PC needs a 24GB+ VRAM GPU (which costs $1000+).

Quick benchmarks (approximate, may vary):

Mac	Best model speed
M1 8GB	3B–7B models OK
M2/M3 16GB	8B–13B models good
M3 Pro 32GB	30B models reasonable
M3 Max 64GB+	70B models possible
Apple Studio	Frontier-size models possible

For local AI on a budget, modern Macs are surprisingly capable.

Hardware recommendations

To run small models (3B-7B)

Any modern computer with 8GB+ RAM
No GPU needed
Will be slow on older hardware

To run medium models (13B-30B)

16-32GB RAM
Discrete GPU helpful on PC
Apple Silicon Mac excellent option

To run large models (70B+)

64GB+ RAM
Multiple GPUs typical on PC
Apple Silicon Mac with 64GB+ unified memory
Or: don’t run locally; use cloud API

Privacy benefits

When you run AI with Ollama:

No data leaves your computer
No API key needed
No usage logging by AI provider
Works offline
No cost per use

For sensitive work (confidential business documents, personal journaling, medical questions, legal matters), local AI is the privacy gold standard.

Limitations vs cloud AI

Local AI is not equivalent to cloud frontier models:

Local model limitations:

Smaller models = less capable
Even Llama 3.3 70B (best open-weights) is below Claude Opus or GPT-4o
No native vision in most models (but Llava etc. exist)
No real-time web search
No image generation
No voice mode

When to use cloud AI instead:

Need frontier capability
Need current information (web search)
Need image/video/voice generation
One-off tasks where setup time isn’t worth it

When local AI is better:

Sensitive data
High volume (cost would be significant in cloud)
Offline use
Customisation/fine-tuning
Learning how AI works

Australian-specific notes

Genuinely useful for Australian users with privacy concerns
Data sovereignty: your data stays in Australia (because it stays on your machine)
For Australian businesses with APP 8 considerations: local AI avoids cross-border disclosure
Particularly useful for: clinicians, lawyers, accountants handling sensitive data
Internet connection used only for initial model download

Recommended starter setup

For a typical Australian user wanting to try local AI:

Install Ollama (10 minutes)
Pull Llama 3.2 3B (~2GB; runs on most hardware)
Try it for a week — see if it meets your needs
If it works: explore larger models if hardware allows
If you want a GUI: add LM Studio
For ongoing serious use: consider Open WebUI

Common gotchas

Disk space adds up. Each model is 2-40GB. Manage models you actually use.
Models won’t run if RAM insufficient. You’ll get an error or extremely slow performance.
Apple Silicon is significantly faster than equivalent Intel Mac. Newer is better.
Quality below cloud AI is real. Don’t expect Claude Opus from your laptop.
First-time download is large but subsequent runs are local-only.
Battery life on laptops is impacted by AI use — use plugged in for heavy work.

Sources

Ollama documentation: ollama.com
Tested setup (June 2026)
Open-weights model documentation
Personal experience running local AI on Mac and PC hardware

Tech & AI, Explained

Explorer

set-up-ollama