AI API Cheat Sheet — Quick Reference for Developers

Status: 🟩 COMPLETE 🟦 LIVING Section: cheat-sheets Tags: cheat-sheet, api, developer, reference

How to read this

Quick reference for the main AI APIs developers use. Each section shows the SDK installation, basic call pattern, and common gotchas.

For setup guides, see:

OpenAI (ChatGPT / GPT models)

Install

pip install openai
# or
npm install openai

Basic call (Python)

from openai import OpenAI
client = OpenAI(api_key="sk-...")
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

Common models

gpt-4o — flagship; multi-modal
gpt-4o-mini — cheap; fast
o3 — best reasoning
o3-mini — cheaper reasoning
gpt-4o-realtime-preview — voice agent

Pricing (USD per million tokens)

gpt-4o-mini: $0.15 in /$ 0.60 out
gpt-4o: $2.50 in /$ 10 out
o3: $10 in /$ 40 out

Key parameters

temperature (0-2): 0 = deterministic; 1 = balanced; 2 = creative
max_tokens: cap response length
top_p: nucleus sampling
stream=True: stream tokens as generated
tools=[...]: function calling
response_format={"type": "json_object"}: force JSON output

Gotchas

Models change names; check current docs
o3 doesn’t support system message (use developer role)
Costs add up fast with verbose responses — set max_tokens
Free $5 credit on new accounts; expires in 3 months

Anthropic (Claude)

Install

pip install anthropic
# or
npm install @anthropic-ai/sdk

Basic call (Python)

import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
 
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)
print(message.content[0].text)

Common models (mid-2026)

claude-haiku-4-5 — fast; cheap
claude-sonnet-4-6 — best value; most popular
claude-opus-4-7 — most capable; reasoning

Pricing (USD per million tokens)

Haiku: $0.80 in /$ 4 out
Sonnet: $3 in /$ 15 out
Opus: $15 in /$ 75 out

Key parameters

max_tokens: REQUIRED (unlike OpenAI)
temperature (0-1): 0 = deterministic; 1 = creative
system="...": system prompt (separate from messages)
stream=True: streaming
tools=[...]: tool use (function calling)
thinking={"type": "enabled"}: extended thinking on Sonnet+

Extended thinking

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[...]
)

Gotchas

max_tokens is mandatory
Context window is 200K tokens
API requires phone verification
Pro subscription ≠ API credits (separate billing)

Google AI (Gemini)

Install

pip install google-generativeai
# or
npm install @google/generative-ai

Basic call (Python)

import google.generativeai as genai
genai.configure(api_key="AIza...")
 
model = genai.GenerativeModel('gemini-2.5-flash')
response = model.generate_content("Hello, Gemini!")
print(response.text)

Common models

gemini-2.5-flash — fast; cheap; 1M context
gemini-2.5-pro — best quality; 2M context
gemini-2.5-flash-8b — smallest; cheapest
gemini-2.5-flash-thinking — reasoning mode

Pricing (USD per million tokens)

Flash: $0.075 in (u n d er 200 K) /$ 0.30 out
Pro: $1.25 in /$ 10 out
Flash-8b: $0.0375 in /$ 0.15 out

Key parameters

generation_config={"temperature": 0.7, "max_output_tokens": 1024}
safety_settings={...}: content moderation
tools=[...]: function calling
stream=True: streaming

Multimodal (with image)

import PIL.Image
img = PIL.Image.open("path/to/image.jpg")
response = model.generate_content(["Describe this image", img])

Gotchas

Free tier has rate limits (15 req/min Flash; 2 req/min Pro)
AI Studio vs Vertex AI — different products, different APIs
Model names change — verify current

Mistral

Install

pip install mistralai
# or
npm install @mistralai/mistralai

Basic call (Python)

from mistralai import Mistral
client = Mistral(api_key="...")
 
response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Common models

mistral-small-latest — fast; cheap
mistral-medium-latest — balanced
mistral-large-latest — flagship
codestral-latest — coding specialised
mistral-saba-latest — Arabic specialised

Pricing (USD per million tokens)

Small: $0.10 in /$ 0.30 out
Medium: ~ $0.40 in /$ 1.20 out
Large: $2 in /$ 6 out

Gotchas

EU-hosted (data residency advantage)
Open-weights models available for self-hosting
Codestral has commercial licence requirement for some uses

Groq (fastest inference)

Install

pip install groq
# or
npm install groq-sdk

Basic call (Python)

from groq import Groq
client = Groq(api_key="gsk_...")
 
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Common models

llama-3.3-70b-versatile — Llama 3.3 large
llama-3.1-8b-instant — Llama small
mixtral-8x7b-32768 — Mixtral
gemma2-9b-it — Gemma

Pricing

Generally significantly cheaper than direct OpenAI/Anthropic
Llama 70B: ~ $0.59 in /$ 0.79 out per million

Gotchas

Speed advantage: ~500-800 tokens/sec
Rate limits apply
Models are open-weights; quality matches direct hosting

Common patterns across providers

Streaming (Python pattern)

# OpenAI
for chunk in client.chat.completions.create(model="gpt-4o", messages=[...], stream=True):
    print(chunk.choices[0].delta.content or "", end="")
 
# Anthropic
with client.messages.stream(model="claude-sonnet-4-6", max_tokens=1024, messages=[...]) as stream:
    for text in stream.text_stream:
        print(text, end="")
 
# Google
for chunk in model.generate_content("...", stream=True):
    print(chunk.text, end="")

Tool use / function calling

# Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]
 
# Pass to API
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    tools=tools
)

JSON mode

# OpenAI - native JSON mode
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    response_format={"type": "json_object"}
)
 
# Anthropic - via system prompt
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="Respond only in valid JSON. No markdown.",
    messages=[...]
)

Async (Python)

# OpenAI async
from openai import AsyncOpenAI
client = AsyncOpenAI()
response = await client.chat.completions.create(...)
 
# Anthropic async
from anthropic import AsyncAnthropic
client = AsyncAnthropic()
response = await client.messages.create(...)

Environment variables (recommended)

Never hardcode API keys. Use environment variables:

`.env` file

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

Load in Python

import os
from dotenv import load_dotenv
load_dotenv()
 
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Load in Node.js

import 'dotenv/config';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Always

Add .env to .gitignore
Never commit keys to git
Rotate keys if accidentally exposed

Error handling

Common errors and meanings

401 Unauthorized — invalid API key
429 Too Many Requests — rate limited
500 / 503 — provider issue; retry
400 Bad Request — your request is invalid
402 Payment Required (or insufficient credits) — top up

Retry with backoff (Python pattern)

import time
from openai import OpenAI, RateLimitError
 
def call_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep(2 ** attempt)
    raise Exception("Max retries exceeded")

Token counting

Approximate before sending:

tiktoken (OpenAI)

import tiktoken
encoding = tiktoken.encoding_for_model("gpt-4o")
tokens = encoding.encode("Your text here")
print(len(tokens))  # token count

Anthropic counts via API

client.messages.count_tokens(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Your text"}]
)

Rough rule: 1 token ≈ 0.75 words English text.

Sources

OpenAI API docs: platform.openai.com/docs
Anthropic API docs: docs.anthropic.com
Google AI Studio docs: ai.google.dev/gemini-api/docs
Mistral docs: docs.mistral.ai
Groq docs: console.groq.com/docs

Tech & AI, Explained

Explorer

ai-api-cheat-sheet

AI API Cheat Sheet — Quick Reference for Developers

How to read this

OpenAI (ChatGPT / GPT models)

Install

Basic call (Python)

Common models

Pricing (USD per million tokens)

Key parameters

Gotchas

Anthropic (Claude)

Install

Basic call (Python)

Common models (mid-2026)

Pricing (USD per million tokens)

Key parameters

Extended thinking

Gotchas

Google AI (Gemini)

Install

Basic call (Python)

Common models

Pricing (USD per million tokens)

Key parameters

Multimodal (with image)

Gotchas

Mistral

Install

Basic call (Python)

Common models

Pricing (USD per million tokens)

Gotchas

Groq (fastest inference)

Install

Basic call (Python)

Common models

Pricing

Gotchas

Common patterns across providers

Streaming (Python pattern)

Tool use / function calling

JSON mode

Async (Python)

Environment variables (recommended)

.env file

Load in Python

Load in Node.js

Always

Error handling

Common errors and meanings

Retry with backoff (Python pattern)

Token counting

tiktoken (OpenAI)

Anthropic counts via API

See also

Sources

Graph View

Table of Contents

Backlinks

`.env` file