The time it takes for an AI agent to complete tasks dropped by 30% after OpenAI introduced WebSockets in its Responses API. The upgrade reduced overhead from repeated API calls by using connection-scoped caching. Model response latency fell from an average of 1.2 seconds to 0.8 seconds per query.
The change targets the Codex agent loop, where agents handle multiple steps before returning a final answer. Previously, each step required a new API request, adding delay. With WebSockets, the connection stays open, letting the server push updates instantly. This lowers the load on servers while speeding up responses.
OpenAI tested the system on a set of 10,000 agent workflows. Tasks that once took 5.6 seconds now finish in 3.9 seconds. The biggest gains came in complex queries involving memory retrieval and tool use. Engineers noted the caching layer cut redundant data transfers by half.
The update is part of a broader push to make AI agents more practical for real-time use. WebSockets had been used mainly for chat interfaces. Adapting them for agent workflows required new protocols to handle partial results and error recovery.
Developers can now use the Responses API with WebSocket support in beta. The feature is optional but recommended for workflows needing low latency.
Source: openai.com