🇺🇸 United States · LangSmith — LLM Application Observability
Status: 🟩 COMPLETE 🟦 LIVING Section: 10 — AI and LLMs
| Vendor | LangChain (LangSmith is a product) |
| Country/origin | 🇺🇸 United States (San Francisco) |
| Recommended for AUS? | ✅ Yes — US-based; standard enterprise data handling; widely used by Australian AI builders |
| Privacy summary | AWS hosting; SOC 2 Type II; GDPR compliant; data residency options for enterprise; standard SaaS DPA |
| Free tier | ✅ Yes — Developer tier (limited traces) |
| Paid tiers | Plus (~$39 USD/month/user), Enterprise (custom) |
| First released | LangChain founded 2022; LangSmith product 2023 |
| Last reviewed | June 2026 |
| Official site | https://smith.langchain.com |
What it is
LangSmith is a platform for observing, debugging, evaluating, and improving LLM-powered applications. Think of it as the “developer tools” for AI applications — letting you see what your AI is doing, why it’s making certain decisions, where it’s failing, and how to make it better.
If you’re building an AI application that processes thousands of user queries per day, LangSmith answers questions like:
- “Why did this user’s query get a bad response?”
- “What are the most common ways our AI fails?”
- “Is our new prompt actually better than the old one?”
- “Which model version performs best for our use case?”
- “How much does each query cost in tokens?”
Built by LangChain, the makers of the popular open-source LangChain framework for LLM application development.
What it does (capabilities)
Tracing
- Capture every LLM call your application makes
- See full prompt + response pairs
- Tree visualisation of multi-step agent workflows
- Token counts and costs per trace
- Latency tracking
Debugging
- Drill into failing requests to see what went wrong
- Replay traces with modified prompts to test fixes
- Side-by-side comparison of different versions
Evaluation
- Test datasets — collections of input/expected output pairs
- Automated evaluation — AI judges scoring responses against criteria
- Human evaluation workflows — get human reviewers to rate outputs
- A/B testing — compare prompt or model variations
Monitoring (production)
- Real-time dashboards of production AI usage
- Alerts on quality drops, error spikes, cost spikes
- User feedback collection integrated into traces
Prompt management
- Version control for prompts
- Prompt playground for iteration
- Deploy specific versions to production
What you’d use it for (as an AI developer)
- Debugging hard-to-reproduce AI failures (“user said the AI gave a bad answer; let me see what happened”)
- Improving prompts systematically with measurable metrics
- Catching regressions when you change models or prompts
- Cost optimization — finding which queries waste tokens
- Compliance auditing — having records of what your AI did
- Onboarding new team members by showing real production AI behaviour
How to access from Australia
- Go to https://smith.langchain.com → Sign up
- Free developer tier on signup
- Get API key from settings
- Add to your LLM application code (works with LangChain or vanilla SDK)
Basic Python integration:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"
# Your existing LLM code now traces to LangSmith automaticallyThe integration is genuinely lightweight — minimal code changes for substantial visibility.
What it costs
| Plan | Price | Traces/month |
|---|---|---|
| Developer | $0 | 5,000 traces |
| Plus | $39 USD/month per user | 10,000 traces; advanced features |
| Enterprise | Custom | Unlimited; SSO; data residency |
For small AI applications, the free tier is sufficient. Production apps with substantial traffic need paid plans.
How it compares to alternatives
| Tool | Country | Best for |
|---|---|---|
| LangSmith | 🇺🇸 | LangChain users; deep observability; evaluation |
| Langfuse | 🇩🇪 | Open-source alternative; EU-friendly |
| Helicone | 🇺🇸 | Simpler integration; API proxy approach |
| Weights & Biases | 🇺🇸 | ML practitioners; broader experiment tracking |
| Phoenix (Arize) | 🇺🇸 | Open-source; ML model monitoring |
| OpenTelemetry + custom | Various | DIY approach |
LangSmith’s niche: the most feature-complete commercial LLM observability platform, with strongest integration if you use LangChain.
LangSmith vs Langfuse
This is the main comparison for developers:
| Aspect | LangSmith | Langfuse |
|---|---|---|
| License | Commercial | Open-source (self-hostable) |
| Country | 🇺🇸 USA | 🇩🇪 Germany |
| LangChain integration | ✅ Native | Works with any framework |
| Self-hosted option | Enterprise only | ✅ Free |
| Pricing | Per-user | Cloud SaaS or free self-host |
| GDPR posture | Standard SaaS DPA | EU-native |
| Maturity | Most polished | Strong; growing |
For Australian developers:
- Privacy-sensitive: Langfuse self-hosted (run on your own AWS Sydney) gives maximum data control
- Convenience: LangSmith is the most polished hosted experience
- Cost-effective: Langfuse open-source is genuinely free
- LangChain users: LangSmith integrates more seamlessly
Why observability matters for AI applications
This may seem like a developer-tool niche, but it’s actually critical for serious AI deployment:
Without observability
- A user says “the AI gave me a bad answer” → you have no way to investigate
- You change a prompt and quality silently drops → you don’t know why
- Some users get expensive long responses → costs balloon mysteriously
- Compliance asks “what did the AI do?” → you can’t answer
With observability
- Every interaction is recorded with full context
- You can systematically improve based on real failures
- You catch problems before they become widespread
- You have audit trails for regulatory requirements
For any AI application going beyond toy projects, observability is essential.
Privacy considerations
- Production AI traces contain user inputs — which may include personal information
- Standard SaaS DPA addresses APP 8 cross-border disclosure
- Sensitive applications: consider Langfuse self-hosted for maximum data control
- Healthcare/legal/finance: verify HIPAA/compliance options before sending production traces
- Default opt-in to training: Verify LangChain’s current data use policies for your tier
Australian considerations
- Australian AI companies use LangSmith widely
- For sensitive AU data: Langfuse self-hosted on AWS Sydney provides clearest data residency
- Pricing in USD — typical SaaS exchange rate considerations
- Indemnification: Verify your contract terms for production AI deployments
Gotchas
- Adding to LangSmith means traces leave your infrastructure. For privacy-sensitive applications, evaluate carefully.
- Free tier exhausts quickly in production. 5,000 traces sounds like a lot until you process 1,000 user queries per day.
- Tracing has slight latency overhead. Usually negligible but verify in performance-critical applications.
- Evaluation features have a learning curve. Setting up automated evaluators properly takes some investment.
- Prompt management can become complex. With many prompt versions across many features, organisation matters.
Recent changes (LIVING)
- Better non-LangChain support (2024-2025): Works well now with vanilla OpenAI/Anthropic SDK
- Improved evaluation features (2024): AI judges, comparison tools
- Datasets and human review workflows (2024-2025): More sophisticated quality processes
- Enterprise features (2025): Better data residency, SSO, advanced admin
See also
- langfuse — open-source competitor
- helicone — simpler observability alternative
- vercel-ai-sdk — application framework that integrates
- openai-api — what’s underlying many traced calls
- claude-api-overview — Anthropic API observability
Sources
- LangSmith documentation: docs.smith.langchain.com
- LangChain documentation: langchain.com
- LangChain Series A funding announcement (2024)
- Independent comparisons of LLM observability tools (2024-2026)
- Developer community discussions: Hacker News, r/LocalLLaMA