AI Successfully Creates Competent Minesweeper Clones: OpenAI’s Codex Outperforms, Google’s Gemini CLI Falls Short

Key Takeaways

OpenAI’s Codex excelled in a test to create a web-based Minesweeper clone, scoring 9/10 for its comprehensive features.
Anthropic’s Claude Code followed with a score of 7/10, noted for its aesthetics but lacking a key gameplay feature.
Google’s Gemini CLI failed entirely, scoring 0/10, unable to produce a functional game.

AI Coding Agents Tested

A recent test by Ars Technica evaluated four popular AI coding agents tasked with creating a web version of Minesweeper. The challenge included features like sound effects, mobile touchscreen support, and an innovative twist in gameplay. The test featured OpenAI’s Codex, Anthropic’s Claude Code, Mistral Vibe, and Google’s Gemini CLI, with results based solely on the initial code produced, without any human intervention.

OpenAI’s Codex stood out, receiving a score of 9/10. It successfully integrated “chording,” a technique familiar to seasoned Minesweeper players, allowing users to reveal surrounding tiles. The coding interface was smooth, with responsive sound effects and on-screen instructions enhancing the experience. Codex’s distinctive “Lucky Sweep” feature added randomness by revealing a safe tile under certain conditions, making it the most complete entry.

In second place, Anthropic’s Claude Code earned 7/10, creating the most visually appealing version. It completed its task in under five minutes, boasting a clear interface and pleasant sound effects. However, the absence of chording detracted from the overall experience, despite the addition of a “Power Mode” that introduced gameplay power-ups.

Mistral Vibe landed in third with a 4/10 score. Although it produced a playable game, it lacked sound effects and the crucial chording feature. Visual glitches and an unresponsive “Custom” button further diminished its rating, although the coding interface was still user-friendly.

Lastly, Google’s Gemini CLI ranked last with a disappointing score of 0/10. The attempted clone failed to function at all, lacking essential game elements. Despite its visual similarity to Claude’s version, Gemini took excessively long to provide output and frequently requested external dependencies. The limited access to the latest coding models may have contributed to this failure, rendering the test incomplete.

While Codex emerged as the clear leader, the mixed results from Claude Code and Mistral Vibe reveal the varying capabilities of AI agents in coding. Meanwhile, Gemini’s performance raises questions about its current utility, particularly given Google’s prominence in AI development. This experiment may not have galvanized new support for AI by demonstrating its limitations rather than its potential.

The content above is a summary. For more details, see the source article.