How to Set Up Ollama — Run AI Models Locally on Your Computer

Status: 🟩 COMPLETE 🟦 LIVING Section: how-to Tags: ollama, local-ai, privacy, open-weights, llama, setup, walkthrough


What you’re doing

Ollama is the easiest way to run AI models on your own computer — no cloud, no API keys, no data leaving your machine. Once set up, you can run Llama, Gemma, Mistral, Phi, and many other open-weights models locally for free.

This is genuinely useful when you want:

  • Maximum privacy (data never leaves your computer)
  • Unlimited use (no per-token costs)
  • Offline AI
  • To learn how AI actually works

Time: 15-30 minutes including downloading a model.


What you need

  • A modern computer (Mac, Windows, or Linux)
  • Mac: 8GB+ RAM (16GB+ recommended); Apple Silicon (M1/M2/M3/M4) strongly preferred
  • PC: 16GB+ RAM (32GB+ for larger models); GPU helpful but not required
  • About 10GB free disk space (more for multiple models)
  • Internet connection for initial download

Step-by-step

Step 1 — Download Ollama

Go to https://ollama.com.

Click Download and choose your operating system:

  • macOS
  • Windows
  • Linux

Install like any other application.

Step 2 — Open Terminal

You’ll interact with Ollama through the terminal/command line. Don’t panic — it’s just a few commands.

Mac: Open Terminal app (Spotlight: type “Terminal”)

Windows: Open PowerShell or Command Prompt (Win+R, type “powershell” or “cmd”)

Linux: You know how to open terminal

Step 3 — Pull your first model

In the terminal, type:

ollama pull llama3.2:3b

Press Enter. Ollama downloads the model (a few GB). Wait for it to finish.

Why this model?

  • 3 billion parameters (small enough to run on most computers)
  • Good quality for everyday tasks
  • Fast on most hardware
  • Llama 3.2 from Meta

Step 4 — Chat with the model

In terminal:

ollama run llama3.2:3b

You’ll see a prompt. Type a question:

>>> What's the capital of Australia?

Press Enter. The model thinks for a moment and responds.

Type /bye to exit.

Congratulations — you just ran an AI model on your own computer with zero data leaving your machine.


Models worth trying

After Llama 3.2:3b, try others:

Small (works on most computers)

ollama pull phi3:mini       # Microsoft's Phi-3 Mini
ollama pull gemma2:2b       # Google's Gemma 2 2B
ollama pull tinyllama       # Very small; testing
ollama pull llama3.1:8b     # Llama 3.1 8B
ollama pull mistral         # Mistral 7B
ollama pull gemma2:9b       # Gemma 2 9B
ollama pull phi3:medium     # Phi-3 14B
ollama pull llama3.3:70b    # Llama 3.3 70B (very capable)
ollama pull gemma2:27b      # Gemma 2 27B
ollama pull mixtral         # Mixtral 8x7B

Specialised

ollama pull codellama       # Code-focused
ollama pull llava           # Vision-capable (sees images)
ollama pull nomic-embed-text # Embeddings for search

Run any model

Once pulled, run with:

ollama run <model-name>

For example:

ollama run mistral
ollama run gemma2:9b

Useful commands

# List models you have
ollama list
 
# Show model info
ollama show llama3.2:3b
 
# Remove a model (frees disk space)
ollama rm modelname
 
# Update Ollama itself
# (Mac: app auto-updates; Linux: run install script again)
 
# Stop a running model
# (Just exit the chat with /bye or Ctrl+C)

Use Ollama from your own code

Ollama runs a local API server. You can call it from Python, JavaScript, or any language:

Python

import requests
response = requests.post('http://localhost:11434/api/generate',
    json={'model': 'llama3.2:3b', 'prompt': 'Hello!', 'stream': False})
print(response.json()['response'])

Using OpenAI Python SDK (Ollama is OpenAI API-compatible)

from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # required but unused
)
response = client.chat.completions.create(
    model='llama3.2:3b',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response.choices[0].message.content)

This means you can develop AI applications using local models for free — useful for testing and privacy-sensitive work.


GUI option: LM Studio or Open WebUI

If you don’t love the terminal, try GUI tools:

LM Studio

  • Download: https://lmstudio.ai
  • Visual app for downloading and chatting with models
  • Easier for non-technical users
  • Works alongside or instead of Ollama

Open WebUI

  • Browser interface for Ollama
  • Install: docker run command on their docs
  • Looks like ChatGPT
  • Great for ongoing use

Jan.ai


Apple Silicon advantage

If you have a Mac with M1/M2/M3/M4 chip:

Why it matters: Apple Silicon has “unified memory” — RAM is shared between CPU and GPU. This means a Mac with 32GB unified RAM can run 30B parameter models, whereas a PC needs a 24GB+ VRAM GPU (which costs $1000+).

Quick benchmarks (approximate, may vary):

MacBest model speed
M1 8GB3B–7B models OK
M2/M3 16GB8B–13B models good
M3 Pro 32GB30B models reasonable
M3 Max 64GB+70B models possible
Apple StudioFrontier-size models possible

For local AI on a budget, modern Macs are surprisingly capable.


Hardware recommendations

To run small models (3B-7B)

  • Any modern computer with 8GB+ RAM
  • No GPU needed
  • Will be slow on older hardware

To run medium models (13B-30B)

  • 16-32GB RAM
  • Discrete GPU helpful on PC
  • Apple Silicon Mac excellent option

To run large models (70B+)

  • 64GB+ RAM
  • Multiple GPUs typical on PC
  • Apple Silicon Mac with 64GB+ unified memory
  • Or: don’t run locally; use cloud API

Privacy benefits

When you run AI with Ollama:

  • No data leaves your computer
  • No API key needed
  • No usage logging by AI provider
  • Works offline
  • No cost per use

For sensitive work (confidential business documents, personal journaling, medical questions, legal matters), local AI is the privacy gold standard.


Limitations vs cloud AI

Local AI is not equivalent to cloud frontier models:

Local model limitations:

  • Smaller models = less capable
  • Even Llama 3.3 70B (best open-weights) is below Claude Opus or GPT-4o
  • No native vision in most models (but Llava etc. exist)
  • No real-time web search
  • No image generation
  • No voice mode

When to use cloud AI instead:

  • Need frontier capability
  • Need current information (web search)
  • Need image/video/voice generation
  • One-off tasks where setup time isn’t worth it

When local AI is better:

  • Sensitive data
  • High volume (cost would be significant in cloud)
  • Offline use
  • Customisation/fine-tuning
  • Learning how AI works

Australian-specific notes

  • Genuinely useful for Australian users with privacy concerns
  • Data sovereignty: your data stays in Australia (because it stays on your machine)
  • For Australian businesses with APP 8 considerations: local AI avoids cross-border disclosure
  • Particularly useful for: clinicians, lawyers, accountants handling sensitive data
  • Internet connection used only for initial model download

For a typical Australian user wanting to try local AI:

  1. Install Ollama (10 minutes)
  2. Pull Llama 3.2 3B (~2GB; runs on most hardware)
  3. Try it for a week — see if it meets your needs
  4. If it works: explore larger models if hardware allows
  5. If you want a GUI: add LM Studio
  6. For ongoing serious use: consider Open WebUI

Common gotchas

  • Disk space adds up. Each model is 2-40GB. Manage models you actually use.
  • Models won’t run if RAM insufficient. You’ll get an error or extremely slow performance.
  • Apple Silicon is significantly faster than equivalent Intel Mac. Newer is better.
  • Quality below cloud AI is real. Don’t expect Claude Opus from your laptop.
  • First-time download is large but subsequent runs are local-only.
  • Battery life on laptops is impacted by AI use — use plugged in for heavy work.

See also


Sources

  • Ollama documentation: ollama.com
  • Tested setup (June 2026)
  • Open-weights model documentation
  • Personal experience running local AI on Mac and PC hardware