Technical API Analysis: OpenAI GPT vs Anthropic Claude

TL;DR Matrix

A summary of key technical differences for quick reference.

Dimension	OpenAI (GPT series)	Anthropic (Claude series)
API Endpoint	`/v1/chat/completions`	`/v1/messages`
System Prompt	Passed as a `message` object with `role: "system"`	Passed as a top-level `system` parameter
Tool/Function Calling	Native `tools` and `tool_choice` parameters	Beta `tools` parameter, requires specific model versions
Vision Support	`content` array with `type: "image_url"` objects	`content` array with `type: "image"` objects (base64 encoded)
Streaming	`stream: true` returns Server-Sent Events (SSE)	`stream: true` returns Server-Sent Events (SSE)
Key Headers	`Authorization: Bearer $API_KEY`	`x-api-key: $API_KEY`, `anthropic-version: YYYY-MM-DD`
Rate Limit Header	`X-RateLimit-Remaining-Tokens`	`anthropic-ratelimit-requests-remaining`
Cost Model	Per-token (input/output), tiered by model	Per-token (input/output), tiered by model

Use Cases

Choosing a model often depends on the job's specific requirements.

OpenAI: Best for complex agentic workflows needing reliable tool use. Its function calling is mature and well-documented, making it ideal for integrating with external APIs or databases. Fine-tuning capabilities also offer a path to specialised behaviour.
Anthropic: Excels at tasks requiring long context windows and a strong adherence to safety guidelines. It's a good choice for summarising large documents, analysing legal texts or performing creative writing tasks where a constitutional AI's behaviour is preferred.

Technical Analysis

The two platforms have converged on a similar messages API structure, but key differences remain in request and response shapes.

OpenAI: Chat Completions API

OpenAI's API is structured around a list of message objects. The system prompt is the first message in this list.

Request (curl)

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that speaks British English."
      },
      {
        "role": "user",
        "content": "What is the colour of the sky?"
      }
    ],
    "max_tokens": 50
  }'

Response (Abbreviated)

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1714567890,
  "model": "gpt-4o-2024-05-13",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The colour of the sky is typically blue during a clear day."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 13,
    "total_tokens": 38
  }
}

Anthropic: Messages API

Anthropic's API requires a version header and separates the system prompt from the conversational messages.

Request (curl)

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "system": "You are a helpful assistant that speaks British English.",
    "messages": [
      {
        "role": "user",
        "content": "What is the colour of the sky?"
      }
    ],
    "max_tokens": 50
  }'

Response (Abbreviated)

{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-sonnet-20240229",
  "content": [
    {
      "type": "text",
      "text": "The colour of the sky is typically blue on a clear day."
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 23,
    "output_tokens": 14
  }
}

The bit most guides skip: The primary structural difference is Anthropic's elevation of system to a first-class parameter outside the messages array. This isn't just syntactic sugar. Anthropic's models are tuned to treat this parameter with higher precedence, which can lead to more reliable instruction-following for persona and rule-setting. It also means you can't inject a system prompt halfway through a conversation, unlike with OpenAI. Anthropic's response structure is also more explicit. The content is an array of blocks (e.g. type: "text"), preparing for multi-modal outputs, whilst OpenAI's is a simpler string in message.content.

Error Handling

Your integration's stability depends on correctly handling API errors, particularly 429 and 5xx status codes.

429 Too Many Requests: Both APIs use this for rate limiting. Your code must handle it gracefully. OpenAI provides X-RateLimit-* headers to help you manage token and request limits proactively. Anthropic provides Retry-After and anthropic-ratelimit-requests-remaining headers. Your retry logic must respect these headers to avoid being blocked.
5xx Server Error: These are transient server-side issues. Implement an exponential backoff strategy with jitter to handle them. A simple approach is to wait (2^attempt * base_delay) + random_jitter seconds before retrying. Don't retry indefinitely. Cap retries at 3-5 attempts before failing the operation.

Python Retry Logic Example

import time
import random
from openai import APITimeoutError, APIConnectionError, RateLimitError, APIStatusError

# This example uses OpenAI's exceptions, but the pattern is identical for Anthropic
def call_with_retry(api_call_func, max_retries=5, base_delay=1.0):
    """Calls an API function with exponential backoff."""
    for attempt in range(max_retries):
        try:
            return api_call_func()
        except (APITimeoutError, APIConnectionError, RateLimitError, APIStatusError) as e:
            if isinstance(e, APIStatusError) and e.status_code < 500 and e.status_code != 429:
                raise # Don't retry on client errors like 400 Bad Request
            if attempt == max_retries - 1:
                raise # Re-raise the final exception

            delay = (base_delay * 2**attempt) + random.uniform(0, 1)
            time.sleep(delay)

Cost & Scalability

Both platforms operate on a pay-as-you-go, per-token model with different prices for input and output tokens.

Cost: Anthropic's Claude 3 Opus is generally more expensive than OpenAI's GPT-4o, whilst Claude 3 Sonnet is priced competitively against GPT-4 Turbo. The key cost driver for large tasks is the context window. Feeding a 150k token document to Claude for a small summary is expensive. A pre-processing step to extract relevant chunks is often more cost-effective than sending the entire document.
Scalability: Standard pay-as-you-go models have rate limits (tokens-per-minute and requests-per-minute). For high-throughput needs, both platforms offer provisioned throughput. This gives you a reserved amount of model processing capacity for a fixed price, removing rate limits and offering lower latency. It's a significant cost commitment and only makes sense for high-volume, predictable workloads.