🇺🇸 USA · Together AI

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Western cloud host for open-weight AI models. Cheap, fast inference for Llama / Mistral / Qwen / DeepSeek / etc. — without sending data to Meta / Mistral / China. The recommended path to use open-weights in production.

Front-matter facts

Field	Value
Vendor	Together AI Inc (San Francisco, USA)
Country / origin	🇺🇸 USA
Recommended for Australian users?	✅ Yes — fully accessible; Western infrastructure for open-weight inference
Privacy summary	API: no training on customer data; uniquely useful for safely running Chinese open weights without data going to China
Free tier	Yes — free API credit on sign-up; some always-free models
Paid tiers	Pay-per-token; pre-paid credits; Enterprise quoted
First released	2022
Last reviewed	2026-06-26
Official site	https://together.ai

What it is

Together AI is a Western cloud provider for open-weight AI model inference. They host the broadest catalog of open-weight models on US infrastructure, with fast inference at competitive prices. For developers wanting to use open-weight models in production without self-hosting, Together is one of the top 2-3 choices.

Hosted models include:

Llama (4 / 5 family from Meta)
Mistral (Large, Codestral, Small, Nemo, etc.)
Qwen (Chinese-origin open weights, safely run on US infrastructure)
DeepSeek (similar — Chinese open weights on US infrastructure)
Yi, Gemma, Phi, Granite, Nemotron, plus 100+ others

Critical use case for AUS users: If you want to use Chinese open-weight models (DeepSeek, Qwen, Yi) for their genuine capability, Together AI is one of the safest paths — your data goes to US-based Together AI infrastructure (Western jurisdiction, no PRC law applicability), NOT to Chinese vendors’ servers. The political-filtering training is still baked into the weights, but the data-flow concern is removed. See DeepSeek entry for the nuance.

What you’d use it for

Production inference for Llama / Mistral / Gemma at competitive cost
Running Chinese open-weight models (DeepSeek, Qwen, Yi) on Western infrastructure
Fine-tuning open-weight models on your data
Cheaper alternative to frontier-closed APIs (Claude / GPT / Gemini) when an open-weight model is “good enough”
Multi-model experimentation — easy to switch between models
Custom dedicated endpoints for predictable production latency

How to use from Australia

Sign up at together.ai. Free credit on sign-up.
Get API key from dashboard

Use OpenAI-compatible API endpoint:

from openai import OpenAI
client = OpenAI(
    api_key="...",
    base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
    model="meta-llama/Llama-4-70B-Instruct",
    messages=[{"role": "user", "content": "Hello"}]
)

Browse model catalog at together.ai/models
AUS card accepted for paid use

What it costs

Pay-per-token (examples — verify current at together.ai/pricing)

Llama 4 70B: ~US $0.20/$ 0.30 per million input/output tokens
Llama 4 405B: ~US $0.90/$ 0.90 per million
Mistral Large: ~US $2/$ 6 per million
Qwen 72B: ~US $0.20/$ 0.30 per million
DeepSeek: similar low rates
Image / video / audio models: per-call pricing

Generally dramatically cheaper than Claude / GPT / Gemini for comparable open-weight tasks.

Dedicated endpoints

Reserved capacity for guaranteed latency / throughput
Hourly pricing
For high-volume production

Free tier

Sign-up credit
Some always-free models for development

How it compares to alternatives

Aspect	Together AI	Fireworks AI	Groq	Hugging Face Inference	Replicate
Model catalog	Broadest open-weight	Curated open-weight	Curated (Llama-focused)	Vast (anything on HF)	Curated
Price (open-weight inference)	Cheap	Cheap	Premium for speed	Moderate	Per-call
Speed / latency	Fast	Fast	Fastest (LPU)	Moderate	Moderate
Fine-tuning	Yes	Yes	Limited	AutoTrain	Limited
Dedicated endpoints	Yes	Yes	Limited	Yes (Inference Endpoints)	Limited
API compatibility	OpenAI-compatible	OpenAI-compatible	OpenAI-compatible	Custom	Custom
Best for	Broadest open-weight catalog + fine-tuning	Cheap fast open-weight inference	Fastest inference for speed-critical	Community hub + tries	One-off model runs

For most open-weight production work, Together AI is a default-recommend.

Privacy / data handling

No training on customer data — contractually committed
API logs retained briefly for abuse-monitoring
US data centres
Critical: running Chinese open weights via Together = your data does NOT go to China; safest path to use Chinese open weights

Recent changes

2026: Llama 5 family added; expanded multimodal models
2025: Dedicated endpoints expanded; fine-tuning improved
2024: Major model catalog expansion (DeepSeek + Qwen added)

Gotchas

For OpenAI / Anthropic / Google models, Together can’t help — they’re closed; use those vendors directly
Open-weight model performance varies — frontier Llama matches or approaches frontier closed models for many tasks but not all; verify for your use case
Chinese open-weights via Together — data is safe, but the political-filtering training is still in the weights (see DeepSeek entry)
Cost-optimisation matters — Llama 70B is dramatically cheaper than Llama 405B; pick the smallest model that meets your quality bar
Dedicated endpoints are essential for predictable latency at high volume
For Bible Quest-style projects, direct Anthropic API or AWS Bedrock might be simpler than Together if you don’t need open weights

Tech & AI, Explained

Explorer

together-ai