Technical API Analysis: OpenAI GPT vs Anthropic Claude
A direct comparison of OpenAI and Anthropic APIs covering request structures, error handling, cost and specific use cases for integration engineers.
TL;DR Matrix
A summary of key technical differences for quick reference.
| Dimension | OpenAI (GPT series) | Anthropic (Claude series) |
|---|---|---|
| API Endpoint | /v1/chat/completions | /v1/messages |
| System Prompt | Passed as a message object with role: "system" | Passed as a top-level system parameter |
| Tool/Function Calling | Native tools and tool_choice parameters | Beta tools parameter, requires specific model versions |
| Vision Support | content array with type: "image_url" objects | content array with type: "image" objects (base64 encoded) |
| Streaming | stream: true returns Server-Sent Events (SSE) | stream: true returns Server-Sent Events (SSE) |
| Key Headers | Authorization: Bearer $API_KEY | x-api-key: $API_KEY, anthropic-version: YYYY-MM-DD |
| Rate Limit Header | X-RateLimit-Remaining-Tokens | anthropic-ratelimit-requests-remaining |
| Cost Model | Per-token (input/output), tiered by model | Per-token (input/output), tiered by model |
Use Cases
Choosing a model often depends on the job's specific requirements.
- OpenAI: Best for complex agentic workflows needing reliable tool use. Its function calling is mature and well-documented, making it ideal for integrating with external APIs or databases. Fine-tuning capabilities also offer a path to specialised behaviour.
- Anthropic: Excels at tasks requiring long context windows and a strong adherence to safety guidelines. It's a good choice for summarising large documents, analysing legal texts or performing creative writing tasks where a constitutional AI's behaviour is preferred.
Technical Analysis
The two platforms have converged on a similar messages API structure, but key differences remain in request and response shapes.
OpenAI: Chat Completions API
OpenAI's API is structured around a list of message objects. The system prompt is the first message in this list.
Request (curl)
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that speaks British English."
},
{
"role": "user",
"content": "What is the colour of the sky?"
}
],
"max_tokens": 50
}'
Response (Abbreviated)
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1714567890,
"model": "gpt-4o-2024-05-13",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The colour of the sky is typically blue during a clear day."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 13,
"total_tokens": 38
}
}
Anthropic: Messages API
Anthropic's API requires a version header and separates the system prompt from the conversational messages.
Request (curl)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-3-sonnet-20240229",
"system": "You are a helpful assistant that speaks British English.",
"messages": [
{
"role": "user",
"content": "What is the colour of the sky?"
}
],
"max_tokens": 50
}'
Response (Abbreviated)
{
"id": "msg_...",
"type": "message",
"role": "assistant",
"model": "claude-3-sonnet-20240229",
"content": [
{
"type": "text",
"text": "The colour of the sky is typically blue on a clear day."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 23,
"output_tokens": 14
}
}
The bit most guides skip: The primary structural difference is Anthropic's elevation of system to a first-class parameter outside the messages array. This isn't just syntactic sugar. Anthropic's models are tuned to treat this parameter with higher precedence, which can lead to more reliable instruction-following for persona and rule-setting. It also means you can't inject a system prompt halfway through a conversation, unlike with OpenAI. Anthropic's response structure is also more explicit. The content is an array of blocks (e.g. type: "text"), preparing for multi-modal outputs, whilst OpenAI's is a simpler string in message.content.
Error Handling
Your integration's stability depends on correctly handling API errors, particularly 429 and 5xx status codes.
- 429 Too Many Requests: Both APIs use this for rate limiting. Your code must handle it gracefully. OpenAI provides
X-RateLimit-*headers to help you manage token and request limits proactively. Anthropic providesRetry-Afterandanthropic-ratelimit-requests-remainingheaders. Your retry logic must respect these headers to avoid being blocked. - 5xx Server Error: These are transient server-side issues. Implement an exponential backoff strategy with jitter to handle them. A simple approach is to wait
(2^attempt * base_delay) + random_jitterseconds before retrying. Don't retry indefinitely. Cap retries at 3-5 attempts before failing the operation.
Python Retry Logic Example
import time
import random
from openai import APITimeoutError, APIConnectionError, RateLimitError, APIStatusError
# This example uses OpenAI's exceptions, but the pattern is identical for Anthropic
def call_with_retry(api_call_func, max_retries=5, base_delay=1.0):
"""Calls an API function with exponential backoff."""
for attempt in range(max_retries):
try:
return api_call_func()
except (APITimeoutError, APIConnectionError, RateLimitError, APIStatusError) as e:
if isinstance(e, APIStatusError) and e.status_code < 500 and e.status_code != 429:
raise # Don't retry on client errors like 400 Bad Request
if attempt == max_retries - 1:
raise # Re-raise the final exception
delay = (base_delay * 2**attempt) + random.uniform(0, 1)
time.sleep(delay)
Cost & Scalability
Both platforms operate on a pay-as-you-go, per-token model with different prices for input and output tokens.
- Cost: Anthropic's Claude 3 Opus is generally more expensive than OpenAI's GPT-4o, whilst Claude 3 Sonnet is priced competitively against GPT-4 Turbo. The key cost driver for large tasks is the context window. Feeding a 150k token document to Claude for a small summary is expensive. A pre-processing step to extract relevant chunks is often more cost-effective than sending the entire document.
- Scalability: Standard pay-as-you-go models have rate limits (tokens-per-minute and requests-per-minute). For high-throughput needs, both platforms offer provisioned throughput. This gives you a reserved amount of model processing capacity for a fixed price, removing rate limits and offering lower latency. It's a significant cost commitment and only makes sense for high-volume, predictable workloads.