OpenAI, the world's leading AI research lab, has partnered with crypto investment firm Paradigm to release EVMbench, a groundbreaking open benchmarking framework designed to evaluate AI agents in securing Ethereum smart contracts. The tool specifically tests AI across three critical modes: detecting vulnerabilities, patching them, and actively exploiting them, drawing from a dataset of 120 curated high-severity vulnerabilities sourced from 40 real-world audits, including those from Code4rena contests and the security audit of Stripe's Tempo blockchain.
Paradigm revealed that when the project began, top AI models could exploit fewer than 20% of critical bugs. That capability has now surged to above 70%, demonstrating rapid advancement in AI's ability to understand and interact with smart contract code. In tandem with the benchmark's release, OpenAI expanded the private beta of its dedicated security research agent, Aardvark, and committed $10 million in API credits through its Cybersecurity Grant Program to support defensive crypto research.
The initiative addresses a critical pain point in the industry: smart contract exploits have drained over $5 billion from DeFi protocols in the last two years alone. OpenAI stated that "measuring AI performance in economically relevant environments is critical as models become powerful tools for both attackers and defenders." Paradigm echoed this, noting, "It’s now clear to us that a growing portion of audits in the future will be done by agents. Hopefully this benchmark, harness, and agent serve both as a preview and an accelerant towards that future."
The development signals a major step in the integration of AI and cryptocurrency, with one of the planet's most influential AI labs formally allocating resources to Ethereum security. The collaboration is grounded in real-world infrastructure, involving Stripe's Tempo, a payments-focused Layer-1 blockchain built with input from Visa, Shopify, and OpenAI itself.