Open-Source Agent Tops TerminalBench Rankings on Gemini-3 Flash Preview
agents gemini open-source
| Source: HN | Original article
OSS Agent tops TerminalBench on Gemini-3-flash-preview. It achieved this without cheating mechanisms.
A significant milestone has been achieved in the development of open-source AI agents, as an independently built agent has topped the TerminalBench on Gemini-3-flash-preview. This agent, which is fully open-source and available on GitHub, scored 65.2% on TerminalBench 2.0, surpassing Google's Gemini and Junie CLI. The achievement is notable for its lack of cheating mechanisms and compliance with leaderboard rules.
This breakthrough matters because it demonstrates the potential for open-source AI agents to compete with proprietary models. The fact that an open-source agent can outperform Google's Gemini, a leading AI model, suggests that the open-source community can drive innovation and advancement in the field. As we reported on April 27, the development of autonomous agents like MolClaw and the use of agentic science require robust testing and evaluation, which TerminalBench provides.
As the AI landscape continues to evolve, it will be interesting to watch how Google and other industry leaders respond to this achievement. Will they open up their models further, or will they focus on developing more proprietary technologies? The open-source community will likely continue to push the boundaries of what is possible with AI agents, and TerminalBench will remain an important benchmark for evaluating their performance.
Sources
Back to AIPULSEN