πŸ‡ΊπŸ‡Έ USA Β· Together AI

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Western cloud host for open-weight AI models. Cheap, fast inference for Llama / Mistral / Qwen / DeepSeek / etc. β€” without sending data to Meta / Mistral / China. The recommended path to use open-weights in production.


Front-matter facts

FieldValue
VendorTogether AI Inc (San Francisco, USA)
Country / originπŸ‡ΊπŸ‡Έ USA
Recommended for Australian users?βœ… Yes β€” fully accessible; Western infrastructure for open-weight inference
Privacy summaryAPI: no training on customer data; uniquely useful for safely running Chinese open weights without data going to China
Free tierYes β€” free API credit on sign-up; some always-free models
Paid tiersPay-per-token; pre-paid credits; Enterprise quoted
First released2022
Last reviewed2026-06-26
Official sitehttps://together.ai

What it is

Together AI is a Western cloud provider for open-weight AI model inference. They host the broadest catalog of open-weight models on US infrastructure, with fast inference at competitive prices. For developers wanting to use open-weight models in production without self-hosting, Together is one of the top 2-3 choices.

Hosted models include:

  • Llama (4 / 5 family from Meta)
  • Mistral (Large, Codestral, Small, Nemo, etc.)
  • Qwen (Chinese-origin open weights, safely run on US infrastructure)
  • DeepSeek (similar β€” Chinese open weights on US infrastructure)
  • Yi, Gemma, Phi, Granite, Nemotron, plus 100+ others

Critical use case for AUS users: If you want to use Chinese open-weight models (DeepSeek, Qwen, Yi) for their genuine capability, Together AI is one of the safest paths β€” your data goes to US-based Together AI infrastructure (Western jurisdiction, no PRC law applicability), NOT to Chinese vendors’ servers. The political-filtering training is still baked into the weights, but the data-flow concern is removed. See DeepSeek entry for the nuance.


What you’d use it for

  • Production inference for Llama / Mistral / Gemma at competitive cost
  • Running Chinese open-weight models (DeepSeek, Qwen, Yi) on Western infrastructure
  • Fine-tuning open-weight models on your data
  • Cheaper alternative to frontier-closed APIs (Claude / GPT / Gemini) when an open-weight model is β€œgood enough”
  • Multi-model experimentation β€” easy to switch between models
  • Custom dedicated endpoints for predictable production latency

How to use from Australia

  1. Sign up at together.ai. Free credit on sign-up.
  2. Get API key from dashboard
  3. Use OpenAI-compatible API endpoint:
    from openai import OpenAI
    client = OpenAI(
        api_key="...",
        base_url="https://api.together.xyz/v1"
    )
    response = client.chat.completions.create(
        model="meta-llama/Llama-4-70B-Instruct",
        messages=[{"role": "user", "content": "Hello"}]
    )
  4. Browse model catalog at together.ai/models
  5. AUS card accepted for paid use

What it costs

Pay-per-token (examples β€” verify current at together.ai/pricing)

  • Llama 4 70B: ~US0.30 per million input/output tokens
  • Llama 4 405B: ~US0.90 per million
  • Mistral Large: ~US6 per million
  • Qwen 72B: ~US0.30 per million
  • DeepSeek: similar low rates
  • Image / video / audio models: per-call pricing

Generally dramatically cheaper than Claude / GPT / Gemini for comparable open-weight tasks.

Dedicated endpoints

  • Reserved capacity for guaranteed latency / throughput
  • Hourly pricing
  • For high-volume production

Free tier

  • Sign-up credit
  • Some always-free models for development

How it compares to alternatives

AspectTogether AIFireworks AIGroqHugging Face InferenceReplicate
Model catalogBroadest open-weightCurated open-weightCurated (Llama-focused)Vast (anything on HF)Curated
Price (open-weight inference)CheapCheapPremium for speedModeratePer-call
Speed / latencyFastFastFastest (LPU)ModerateModerate
Fine-tuningYesYesLimitedAutoTrainLimited
Dedicated endpointsYesYesLimitedYes (Inference Endpoints)Limited
API compatibilityOpenAI-compatibleOpenAI-compatibleOpenAI-compatibleCustomCustom
Best forBroadest open-weight catalog + fine-tuningCheap fast open-weight inferenceFastest inference for speed-criticalCommunity hub + triesOne-off model runs

For most open-weight production work, Together AI is a default-recommend.


Privacy / data handling

  • No training on customer data β€” contractually committed
  • API logs retained briefly for abuse-monitoring
  • US data centres
  • Critical: running Chinese open weights via Together = your data does NOT go to China; safest path to use Chinese open weights

Recent changes

  • 2026: Llama 5 family added; expanded multimodal models
  • 2025: Dedicated endpoints expanded; fine-tuning improved
  • 2024: Major model catalog expansion (DeepSeek + Qwen added)

Gotchas

  • For OpenAI / Anthropic / Google models, Together can’t help β€” they’re closed; use those vendors directly
  • Open-weight model performance varies β€” frontier Llama matches or approaches frontier closed models for many tasks but not all; verify for your use case
  • Chinese open-weights via Together β€” data is safe, but the political-filtering training is still in the weights (see DeepSeek entry)
  • Cost-optimisation matters β€” Llama 70B is dramatically cheaper than Llama 405B; pick the smallest model that meets your quality bar
  • Dedicated endpoints are essential for predictable latency at high volume
  • For Bible Quest-style projects, direct Anthropic API or AWS Bedrock might be simpler than Together if you don’t need open weights

See also


Sources