AI API Cheat Sheet β€” Quick Reference for Developers

Status: 🟩 COMPLETE 🟦 LIVING Section: cheat-sheets Tags: cheat-sheet, api, developer, reference


How to read this

Quick reference for the main AI APIs developers use. Each section shows the SDK installation, basic call pattern, and common gotchas.

For setup guides, see:


OpenAI (ChatGPT / GPT models)

Install

pip install openai
# or
npm install openai

Basic call (Python)

from openai import OpenAI
client = OpenAI(api_key="sk-...")
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

Common models

  • gpt-4o β€” flagship; multi-modal
  • gpt-4o-mini β€” cheap; fast
  • o3 β€” best reasoning
  • o3-mini β€” cheaper reasoning
  • gpt-4o-realtime-preview β€” voice agent

Pricing (USD per million tokens)

  • gpt-4o-mini: 0.60 out
  • gpt-4o: 10 out
  • o3: 40 out

Key parameters

  • temperature (0-2): 0 = deterministic; 1 = balanced; 2 = creative
  • max_tokens: cap response length
  • top_p: nucleus sampling
  • stream=True: stream tokens as generated
  • tools=[...]: function calling
  • response_format={"type": "json_object"}: force JSON output

Gotchas

  • Models change names; check current docs
  • o3 doesn’t support system message (use developer role)
  • Costs add up fast with verbose responses β€” set max_tokens
  • Free $5 credit on new accounts; expires in 3 months

Anthropic (Claude)

Install

pip install anthropic
# or
npm install @anthropic-ai/sdk

Basic call (Python)

import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
 
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude!"}
    ]
)
print(message.content[0].text)

Common models (mid-2026)

  • claude-haiku-4-5 β€” fast; cheap
  • claude-sonnet-4-6 β€” best value; most popular
  • claude-opus-4-7 β€” most capable; reasoning

Pricing (USD per million tokens)

  • Haiku: 4 out
  • Sonnet: 15 out
  • Opus: 75 out

Key parameters

  • max_tokens: REQUIRED (unlike OpenAI)
  • temperature (0-1): 0 = deterministic; 1 = creative
  • system="...": system prompt (separate from messages)
  • stream=True: streaming
  • tools=[...]: tool use (function calling)
  • thinking={"type": "enabled"}: extended thinking on Sonnet+

Extended thinking

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    thinking={"type": "enabled", "budget_tokens": 2048},
    messages=[...]
)

Gotchas

  • max_tokens is mandatory
  • Context window is 200K tokens
  • API requires phone verification
  • Pro subscription β‰  API credits (separate billing)

Google AI (Gemini)

Install

pip install google-generativeai
# or
npm install @google/generative-ai

Basic call (Python)

import google.generativeai as genai
genai.configure(api_key="AIza...")
 
model = genai.GenerativeModel('gemini-2.5-flash')
response = model.generate_content("Hello, Gemini!")
print(response.text)

Common models

  • gemini-2.5-flash β€” fast; cheap; 1M context
  • gemini-2.5-pro β€” best quality; 2M context
  • gemini-2.5-flash-8b β€” smallest; cheapest
  • gemini-2.5-flash-thinking β€” reasoning mode

Pricing (USD per million tokens)

  • Flash: 0.30 out
  • Pro: 10 out
  • Flash-8b: 0.15 out

Key parameters

  • generation_config={"temperature": 0.7, "max_output_tokens": 1024}
  • safety_settings={...}: content moderation
  • tools=[...]: function calling
  • stream=True: streaming

Multimodal (with image)

import PIL.Image
img = PIL.Image.open("path/to/image.jpg")
response = model.generate_content(["Describe this image", img])

Gotchas

  • Free tier has rate limits (15 req/min Flash; 2 req/min Pro)
  • AI Studio vs Vertex AI β€” different products, different APIs
  • Model names change β€” verify current

Mistral

Install

pip install mistralai
# or
npm install @mistralai/mistralai

Basic call (Python)

from mistralai import Mistral
client = Mistral(api_key="...")
 
response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Common models

  • mistral-small-latest β€” fast; cheap
  • mistral-medium-latest β€” balanced
  • mistral-large-latest β€” flagship
  • codestral-latest β€” coding specialised
  • mistral-saba-latest β€” Arabic specialised

Pricing (USD per million tokens)

  • Small: 0.30 out
  • Medium: ~1.20 out
  • Large: 6 out

Gotchas

  • EU-hosted (data residency advantage)
  • Open-weights models available for self-hosting
  • Codestral has commercial licence requirement for some uses

Groq (fastest inference)

Install

pip install groq
# or
npm install groq-sdk

Basic call (Python)

from groq import Groq
client = Groq(api_key="gsk_...")
 
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Common models

  • llama-3.3-70b-versatile β€” Llama 3.3 large
  • llama-3.1-8b-instant β€” Llama small
  • mixtral-8x7b-32768 β€” Mixtral
  • gemma2-9b-it β€” Gemma

Pricing

  • Generally significantly cheaper than direct OpenAI/Anthropic
  • Llama 70B: ~0.79 out per million

Gotchas

  • Speed advantage: ~500-800 tokens/sec
  • Rate limits apply
  • Models are open-weights; quality matches direct hosting

Common patterns across providers

Streaming (Python pattern)

# OpenAI
for chunk in client.chat.completions.create(model="gpt-4o", messages=[...], stream=True):
    print(chunk.choices[0].delta.content or "", end="")
 
# Anthropic
with client.messages.stream(model="claude-sonnet-4-6", max_tokens=1024, messages=[...]) as stream:
    for text in stream.text_stream:
        print(text, end="")
 
# Google
for chunk in model.generate_content("...", stream=True):
    print(chunk.text, end="")

Tool use / function calling

# Define tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]
 
# Pass to API
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    tools=tools
)

JSON mode

# OpenAI - native JSON mode
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    response_format={"type": "json_object"}
)
 
# Anthropic - via system prompt
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="Respond only in valid JSON. No markdown.",
    messages=[...]
)

Async (Python)

# OpenAI async
from openai import AsyncOpenAI
client = AsyncOpenAI()
response = await client.chat.completions.create(...)
 
# Anthropic async
from anthropic import AsyncAnthropic
client = AsyncAnthropic()
response = await client.messages.create(...)

Never hardcode API keys. Use environment variables:

.env file

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

Load in Python

import os
from dotenv import load_dotenv
load_dotenv()
 
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Load in Node.js

import 'dotenv/config';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Always

  • Add .env to .gitignore
  • Never commit keys to git
  • Rotate keys if accidentally exposed

Error handling

Common errors and meanings

  • 401 Unauthorized β€” invalid API key
  • 429 Too Many Requests β€” rate limited
  • 500 / 503 β€” provider issue; retry
  • 400 Bad Request β€” your request is invalid
  • 402 Payment Required (or insufficient credits) β€” top up

Retry with backoff (Python pattern)

import time
from openai import OpenAI, RateLimitError
 
def call_with_retry(client, **kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep(2 ** attempt)
    raise Exception("Max retries exceeded")

Token counting

Approximate before sending:

tiktoken (OpenAI)

import tiktoken
encoding = tiktoken.encoding_for_model("gpt-4o")
tokens = encoding.encode("Your text here")
print(len(tokens))  # token count

Anthropic counts via API

client.messages.count_tokens(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Your text"}]
)

Rough rule: 1 token β‰ˆ 0.75 words English text.


See also


Sources

  • OpenAI API docs: platform.openai.com/docs
  • Anthropic API docs: docs.anthropic.com
  • Google AI Studio docs: ai.google.dev/gemini-api/docs
  • Mistral docs: docs.mistral.ai
  • Groq docs: console.groq.com/docs