πΊπΈ USA Β· Together AI
Status: π© COMPLETE π¦ LIVING Last updated: 2026-06-26 Plain-English tagline: Western cloud host for open-weight AI models. Cheap, fast inference for Llama / Mistral / Qwen / DeepSeek / etc. β without sending data to Meta / Mistral / China. The recommended path to use open-weights in production.
Front-matter facts
| Field | Value |
|---|---|
| Vendor | Together AI Inc (San Francisco, USA) |
| Country / origin | πΊπΈ USA |
| Recommended for Australian users? | β Yes β fully accessible; Western infrastructure for open-weight inference |
| Privacy summary | API: no training on customer data; uniquely useful for safely running Chinese open weights without data going to China |
| Free tier | Yes β free API credit on sign-up; some always-free models |
| Paid tiers | Pay-per-token; pre-paid credits; Enterprise quoted |
| First released | 2022 |
| Last reviewed | 2026-06-26 |
| Official site | https://together.ai |
What it is
Together AI is a Western cloud provider for open-weight AI model inference. They host the broadest catalog of open-weight models on US infrastructure, with fast inference at competitive prices. For developers wanting to use open-weight models in production without self-hosting, Together is one of the top 2-3 choices.
Hosted models include:
- Llama (4 / 5 family from Meta)
- Mistral (Large, Codestral, Small, Nemo, etc.)
- Qwen (Chinese-origin open weights, safely run on US infrastructure)
- DeepSeek (similar β Chinese open weights on US infrastructure)
- Yi, Gemma, Phi, Granite, Nemotron, plus 100+ others
Critical use case for AUS users: If you want to use Chinese open-weight models (DeepSeek, Qwen, Yi) for their genuine capability, Together AI is one of the safest paths β your data goes to US-based Together AI infrastructure (Western jurisdiction, no PRC law applicability), NOT to Chinese vendorsβ servers. The political-filtering training is still baked into the weights, but the data-flow concern is removed. See DeepSeek entry for the nuance.
What youβd use it for
- Production inference for Llama / Mistral / Gemma at competitive cost
- Running Chinese open-weight models (DeepSeek, Qwen, Yi) on Western infrastructure
- Fine-tuning open-weight models on your data
- Cheaper alternative to frontier-closed APIs (Claude / GPT / Gemini) when an open-weight model is βgood enoughβ
- Multi-model experimentation β easy to switch between models
- Custom dedicated endpoints for predictable production latency
How to use from Australia
- Sign up at
together.ai. Free credit on sign-up. - Get API key from dashboard
- Use OpenAI-compatible API endpoint:
from openai import OpenAI client = OpenAI( api_key="...", base_url="https://api.together.xyz/v1" ) response = client.chat.completions.create( model="meta-llama/Llama-4-70B-Instruct", messages=[{"role": "user", "content": "Hello"}] ) - Browse model catalog at together.ai/models
- AUS card accepted for paid use
What it costs
Pay-per-token (examples β verify current at together.ai/pricing)
- Llama 4 70B: ~US0.30 per million input/output tokens
- Llama 4 405B: ~US0.90 per million
- Mistral Large: ~US6 per million
- Qwen 72B: ~US0.30 per million
- DeepSeek: similar low rates
- Image / video / audio models: per-call pricing
Generally dramatically cheaper than Claude / GPT / Gemini for comparable open-weight tasks.
Dedicated endpoints
- Reserved capacity for guaranteed latency / throughput
- Hourly pricing
- For high-volume production
Free tier
- Sign-up credit
- Some always-free models for development
How it compares to alternatives
| Aspect | Together AI | Fireworks AI | Groq | Hugging Face Inference | Replicate |
|---|---|---|---|---|---|
| Model catalog | Broadest open-weight | Curated open-weight | Curated (Llama-focused) | Vast (anything on HF) | Curated |
| Price (open-weight inference) | Cheap | Cheap | Premium for speed | Moderate | Per-call |
| Speed / latency | Fast | Fast | Fastest (LPU) | Moderate | Moderate |
| Fine-tuning | Yes | Yes | Limited | AutoTrain | Limited |
| Dedicated endpoints | Yes | Yes | Limited | Yes (Inference Endpoints) | Limited |
| API compatibility | OpenAI-compatible | OpenAI-compatible | OpenAI-compatible | Custom | Custom |
| Best for | Broadest open-weight catalog + fine-tuning | Cheap fast open-weight inference | Fastest inference for speed-critical | Community hub + tries | One-off model runs |
For most open-weight production work, Together AI is a default-recommend.
Privacy / data handling
- No training on customer data β contractually committed
- API logs retained briefly for abuse-monitoring
- US data centres
- Critical: running Chinese open weights via Together = your data does NOT go to China; safest path to use Chinese open weights
Recent changes
- 2026: Llama 5 family added; expanded multimodal models
- 2025: Dedicated endpoints expanded; fine-tuning improved
- 2024: Major model catalog expansion (DeepSeek + Qwen added)
Gotchas
- For OpenAI / Anthropic / Google models, Together canβt help β theyβre closed; use those vendors directly
- Open-weight model performance varies β frontier Llama matches or approaches frontier closed models for many tasks but not all; verify for your use case
- Chinese open-weights via Together β data is safe, but the political-filtering training is still in the weights (see DeepSeek entry)
- Cost-optimisation matters β Llama 70B is dramatically cheaper than Llama 405B; pick the smallest model that meets your quality bar
- Dedicated endpoints are essential for predictable latency at high volume
- For Bible Quest-style projects, direct Anthropic API or AWS Bedrock might be simpler than Together if you donβt need open weights
See also
- Llama π© π¦
- Mistral π₯
- DeepSeek (β direct; safe via Together) π© π¦
- Qwen (β direct; safe via Together) π© π¦
- Fireworks AI π₯
- Groq π₯
- Hugging Face π© π¦
- Replicate π₯
- AWS Bedrock π© π¦
- Cohere π© π¦
- open-weights-vs-closed.md π₯
- which-ai-for-which-job.md π© π¦