πŸ‡ΊπŸ‡Έ USA Β· Modal

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Serverless GPU / CPU compute for AI β€” write Python locally, deploy to cloud GPUs with zero infrastructure setup. Pay per second of execution. The β€œVercel of GPU workloads.”


Front-matter facts

FieldValue
VendorModal Labs (San Francisco, USA)
Country / originπŸ‡ΊπŸ‡Έ USA
Recommended for Australian users?βœ… Yes β€” fully accessible from AUS
Privacy summaryNo training on customer data; your code runs in isolated containers
Free tierUS$30/month free compute
Paid tiersPay-per-second on top of free tier; Team / Enterprise quoted
First released2022
Last reviewed2026-06-26
Official sitehttps://modal.com

What it is

Modal is serverless GPU / CPU compute for AI workloads. You write Python functions locally, decorate them, and Modal runs them on cloud GPUs (Nvidia A100, H100, L40S, etc.) without you managing servers, containers, or infrastructure.

Example workflow:

import modal
 
app = modal.App("my-app")
 
@app.function(gpu="A100")
def generate_image(prompt):
    # runs on cloud A100 GPU when called
    # all dependencies handled automatically
    pass
 
@app.local_entrypoint()
def main():
    result = generate_image.remote("a kookaburra")

Modal handles:

  • Container building automatically
  • Dependency installation (pip / apt / etc.)
  • Cold starts (typically 1-10 seconds)
  • Auto-scaling (zero to thousands of concurrent executions)
  • Per-second billing

Use cases:

  • Run open-source AI models on demand without infrastructure
  • Batch processing at scale
  • Fine-tuning models with GPU access
  • Web apps / APIs with serverless GPU backends
  • Background jobs for AI processing

What you’d use it for

  • Self-hosted model inference on cloud GPUs (Llama, Mistral, Whisper, Stable Diffusion, etc.) without server management
  • Fine-tuning models on your data
  • Large-scale batch processing of AI workloads
  • Building AI products with serverless backend
  • Running specific open-source models not available on Together / Fireworks
  • Data science / ML research with cloud GPU access

How to use from Australia

  1. Sign up at modal.com β€” US$30/month free compute
  2. Install: pip install modal
  3. Authenticate: modal token new
  4. Write Python functions with @app.function() decorators
  5. Deploy: modal deploy my_script.py
  6. Call from local Python or expose as HTTPS endpoint
  7. AUS card accepted

What it costs

Free tier

  • US$30/month free compute (substantial β€” covers many small projects)

Per-second pricing

  • CPU: very cheap (~US$0.0001/sec for small CPUs)
  • GPU T4: ~US0.65/hour)
  • GPU A100 80GB: ~US3.60/hour)
  • GPU H100 80GB: ~US8.60/hour)
  • Per-second billing means you only pay when functions actually execute

Storage

  • Some persistent volume storage included free
  • Additional storage charged per GB-month

Hidden costs

  • Long-running idle GPUs can add up if you mis-configure
  • Cold starts are fast but real (1-10s); design for them

How it compares to alternatives

AspectModalLambda LabsRunPodCoreWeaveAWS GPU instances
Serverless (no provisioning)Yes (best)LimitedLimitedNo (rent GPUs)Limited
Per-second billingYesPer-hourPer-hour or per-secondPer-hourPer-second
Auto-scalingBestLimitedLimitedManualManual / auto-scaling
GPU typesA100 / H100 / T4 / L40SBroad selectionBroad selectionBroad enterpriseBroad
Best forServerless AI workloadsCheap rented GPUsCheap rented GPUsEnterprise GPU clustersAWS-stack

For developers wanting serverless GPU access without managing infrastructure, Modal is the cleanest option. For renting raw GPUs cheaply, RunPod / Lambda Labs.


Privacy / data handling

  • Code runs in isolated containers per request
  • No training on customer code
  • US data centres
  • For AUS data residency, AWS / Azure / GCP with AUS regions

Recent changes

  • 2026: H200 / Blackwell GPU support
  • 2025: Persistent volumes + sandbox improvements
  • 2024: Major adoption among AI developers

Gotchas

  • Cold starts are 1-10s typically; design async workflows accordingly
  • Python-first β€” for non-Python AI work, less natural
  • Per-second billing means watch your idle time β€” don’t leave functions running idle
  • For inference of common models (Llama, Mistral), Together / Fireworks / Groq usually simpler than self-hosting on Modal
  • For Bible-Quest-scale projects, Modal is overkill β€” Vercel / Supabase covers most needs

See also


Sources