πΊπΈ USA Β· Modal
Status: π© COMPLETE π¦ LIVING Last updated: 2026-06-26 Plain-English tagline: Serverless GPU / CPU compute for AI β write Python locally, deploy to cloud GPUs with zero infrastructure setup. Pay per second of execution. The βVercel of GPU workloads.β
Front-matter facts
| Field | Value |
|---|---|
| Vendor | Modal Labs (San Francisco, USA) |
| Country / origin | πΊπΈ USA |
| Recommended for Australian users? | β Yes β fully accessible from AUS |
| Privacy summary | No training on customer data; your code runs in isolated containers |
| Free tier | US$30/month free compute |
| Paid tiers | Pay-per-second on top of free tier; Team / Enterprise quoted |
| First released | 2022 |
| Last reviewed | 2026-06-26 |
| Official site | https://modal.com |
What it is
Modal is serverless GPU / CPU compute for AI workloads. You write Python functions locally, decorate them, and Modal runs them on cloud GPUs (Nvidia A100, H100, L40S, etc.) without you managing servers, containers, or infrastructure.
Example workflow:
import modal
app = modal.App("my-app")
@app.function(gpu="A100")
def generate_image(prompt):
# runs on cloud A100 GPU when called
# all dependencies handled automatically
pass
@app.local_entrypoint()
def main():
result = generate_image.remote("a kookaburra")Modal handles:
- Container building automatically
- Dependency installation (pip / apt / etc.)
- Cold starts (typically 1-10 seconds)
- Auto-scaling (zero to thousands of concurrent executions)
- Per-second billing
Use cases:
- Run open-source AI models on demand without infrastructure
- Batch processing at scale
- Fine-tuning models with GPU access
- Web apps / APIs with serverless GPU backends
- Background jobs for AI processing
What youβd use it for
- Self-hosted model inference on cloud GPUs (Llama, Mistral, Whisper, Stable Diffusion, etc.) without server management
- Fine-tuning models on your data
- Large-scale batch processing of AI workloads
- Building AI products with serverless backend
- Running specific open-source models not available on Together / Fireworks
- Data science / ML research with cloud GPU access
How to use from Australia
- Sign up at modal.com β US$30/month free compute
- Install:
pip install modal - Authenticate:
modal token new - Write Python functions with
@app.function()decorators - Deploy:
modal deploy my_script.py - Call from local Python or expose as HTTPS endpoint
- AUS card accepted
What it costs
Free tier
- US$30/month free compute (substantial β covers many small projects)
Per-second pricing
- CPU: very cheap (~US$0.0001/sec for small CPUs)
- GPU T4: ~US0.65/hour)
- GPU A100 80GB: ~US3.60/hour)
- GPU H100 80GB: ~US8.60/hour)
- Per-second billing means you only pay when functions actually execute
Storage
- Some persistent volume storage included free
- Additional storage charged per GB-month
Hidden costs
- Long-running idle GPUs can add up if you mis-configure
- Cold starts are fast but real (1-10s); design for them
How it compares to alternatives
| Aspect | Modal | Lambda Labs | RunPod | CoreWeave | AWS GPU instances |
|---|---|---|---|---|---|
| Serverless (no provisioning) | Yes (best) | Limited | Limited | No (rent GPUs) | Limited |
| Per-second billing | Yes | Per-hour | Per-hour or per-second | Per-hour | Per-second |
| Auto-scaling | Best | Limited | Limited | Manual | Manual / auto-scaling |
| GPU types | A100 / H100 / T4 / L40S | Broad selection | Broad selection | Broad enterprise | Broad |
| Best for | Serverless AI workloads | Cheap rented GPUs | Cheap rented GPUs | Enterprise GPU clusters | AWS-stack |
For developers wanting serverless GPU access without managing infrastructure, Modal is the cleanest option. For renting raw GPUs cheaply, RunPod / Lambda Labs.
Privacy / data handling
- Code runs in isolated containers per request
- No training on customer code
- US data centres
- For AUS data residency, AWS / Azure / GCP with AUS regions
Recent changes
- 2026: H200 / Blackwell GPU support
- 2025: Persistent volumes + sandbox improvements
- 2024: Major adoption among AI developers
Gotchas
- Cold starts are 1-10s typically; design async workflows accordingly
- Python-first β for non-Python AI work, less natural
- Per-second billing means watch your idle time β donβt leave functions running idle
- For inference of common models (Llama, Mistral), Together / Fireworks / Groq usually simpler than self-hosting on Modal
- For Bible-Quest-scale projects, Modal is overkill β Vercel / Supabase covers most needs
See also
- Lambda Labs π₯
- RunPod π₯
- CoreWeave π₯
- AWS overview π© π¦
- Together AI π© π¦
- Fireworks AI π© π¦
- Replicate π© π¦
- Hugging Face π© π¦
- Nvidia AI π© π¦
- What is the cloud? π© π¦