Living with GPT-5.4: A Weekend Deep Dive
Article Content
Imagining the future with AI that truly understands and reasons feels closer than ever when you spend a weekend locked in conversation with GPT-5.4. But as impressive as it is, I noticed OpenAI still grapples with some familiar challenges.
First off, the model selection in ChatGPT remains a bit of a mess. Despite promises of a smart router that picks the best model for each task, I found myself toggling manually between versions like GPT-5.3 Instant and GPT-5.4. The Instant variant appeared just days before 5.4 and was touted as the best chat model, yet OpenAI offered no benchmark results. Their response styles differ noticeably, hinting at distinct underlying architectures – at least that’s my take.
With GPT-5.4, the so-called "thinking" mode means it takes anywhere from several seconds to minutes to reply. There are settings for standard or extended thinking, but in my experience, the difference is subtle. Style-wise, GPT still struggles with delivering clear, lively, and natural Russian responses without slipping into awkward English borrowings – something competitors like Claude Sonnet/Opus and Gemini manage better.
The 5.3 Instant model tries to be playful, dropping emojis and friendly banter, but often falls into the old OpenAI trap of generating massive, list-heavy replies that can overwhelm. GPT-5.4 swings the other way, sometimes producing dense paragraphs heavy with bold highlights or breaking text into disjointed one-liners.
Interestingly, even in English conversations, where GPT-5.4 is praised for creativity, I spotted the same quirks. The cheerful chat-bot vibe from earlier GPT-4o days seems to have faded, which might disappoint casual users who enjoyed that friendly tone.
Where GPT-5.4 truly shines is in web search and critical analysis. It dives deep into internet sources, verifying facts meticulously, though this comes at the cost of slower responses—sometimes up to a couple of minutes for simple prompts. Its no-nonsense critique is invaluable; I often cross-check my main model, Opus 4.6, with GPT-5.4 for a second opinion.
For coding, I continue to build projects in Claude Code but rely on GPT-5.4 (previously 5.3) via Codex for testing and code review. The recent addition of limited computer control via the Playwright skill lets GPT-5.4 interact with web and Electron app interfaces—clicking, checking functionality, and even reviewing site design in real time. This feature, although only available through API and Codex, is a game-changer for development workflows.
Admittedly, I mistakenly thought computer control would be broadly available in ChatGPT, but it's more restricted. Still, watching GPT-5.4 "browse" and test a site visually in headed mode is a glimpse into the future of AI-assisted coding and QA.
In summary, despite some stylistic hiccups and a steeper learning curve compared to Sonnet, Opus, or Gemini, GPT-5.4 has earned my respect—especially for coding support and critical thinking. OpenAI’s got some polishing to do on making the chat experience more approachable, but the potential here is undeniable.