Prompting ChatGPT, Claude, Perplexity and Gemini Uncovers Nginx Log Patterns

claude deepseek gemini gpt-5 perplexity

2026-04-20 | Source: HN | Original article

A developer set up an Nginx reverse‑proxy to route prompts from a single web UI to OpenAI’s ChatGPT, Anthropic’s Claude, Perplexity.ai and Google’s Gemini, then examined the access logs to compare how each service behaves under identical traffic. Over a 12‑hour window the proxy recorded 4 million requests, revealing stark contrasts in request size, latency and error patterns that go beyond headline model scores. ChatGPT’s calls averaged 210 ms round‑trip time, with a steady 99 % success rate, but each request carried a 2‑KB JSON payload that included a “model” field and a token‑count hint. Claude’s traffic showed a slightly longer median latency of 280 ms and a higher proportion of 429 “rate‑limit” responses, suggesting a stricter per‑minute quota on the free tier. Perplexity’s endpoint, marketed as a real‑time answer engine, produced the smallest payloads (≈1 KB) but suffered intermittent 500 errors that spiked whenever the query contained ambiguous phrasing. Gemini, the newest entrant, posted the longest tails – 15 % of calls exceeded 500 ms – yet its logs displayed a consistent use of HTTP/2 server push, hinting at a streaming response architecture that could reduce client‑side latency at the cost of higher server load. Why it matters: as multi‑LLM front‑ends proliferate on the Nordic market, developers increasingly rely on shared edge infrastructure to mediate API traffic. The Nginx data shows that cost, reliability and performance are not uniform across providers; a model that wins benchmark tables may still impose heavier bandwidth or stricter throttling in production. For enterprises planning to embed AI assistants in customer‑facing services, these hidden operational differences could affect SLAs and cloud spend. What to watch next: the author plans to repeat the experiment with the upcoming Gemini “hybrid inference” mode announced on April 20, and to test the impact of token‑level streaming on Nginx buffer usage. Observers should also monitor any policy changes from OpenAI and Anthropic that could reshape rate‑limit thresholds, as well as emerging European data‑privacy regulations that may force on‑device inference, a trend hinted at in our April 16 report on Firebase‑key abuse.

Sources

Back to AIPULSEN