Methodology

Everything about how the Blender AI Arena works is public: the voting protocol, the rating math, the benchmark prompts, and who runs the site. If something here seems unfair, tell us.

Principles

  • No fake data. Rankings come only from real recorded votes. We never seed, simulate, or editorially adjust vote counts. During calibration, every tool shows the same default rating and zero votes.
  • Equal treatment. Every tool gets the same profile format, the same outbound link style, and the same benchmark prompts under the same rules.
  • Bad outputs stay visible. The gallery is the raw benchmark record. Failures are published alongside successes for every tool.
  • Full disclosure. This site is built by the team behind 3D-Agent, one of the tools on the roster. See the disclosure section below for how we prevent that from biasing results.

Benchmark Protocol

Each season, every tool on the roster runs the complete benchmark prompt set under identical rules:

  • The prompt text is used verbatim — no tool-specific tuning.
  • Default settings for each tool; no expert parameter tweaking.
  • One generation per prompt, with a single retry allowed only on a technical failure (crash, timeout, empty output).
  • No manual cleanup, retopology, or material edits afterward.
  • Outputs are exported to a common format (GLB) and rendered with identical lighting for comparison.
  • Code-driven assistants generate geometry programmatically from the prompt (via Blender MCP/bpy, or headless three.js runs — the exact path is noted on every output); generators produce meshes from the same prompt text. The arena compares the final output, whatever the path to it.
  • Model families compete per-model: Claude Haiku 4.5, Sonnet 5, Opus 4.8, and Fable 5 each generated their Season 1 entries independently — separate sessions, no shared code, no access to each other's outputs.

Voting Protocol

  • Voters see two outputs generated from the same prompt, side by side, in identical viewers.
  • Tool names are hidden until after the vote. Left/right placement is randomized.
  • Four ballot options: A is better, B is better, Tie, or Both Bad.
  • One vote per matchup per session; duplicate and bot-pattern votes are discarded.

Rating System: Glicko-2

Votes feed a standard Glicko-2 implementation — the rating system used in chess and competitive gaming. It tracks three values per tool: a rating, a rating deviation (RD, the uncertainty), and volatility.

  • Initial state: rating 1500, RD 350, volatility 0.06.
  • RD shrinks toward 30 as votes accumulate — more votes, more certainty.
  • Leaderboard position uses the conservative rating: rating minus 2×RD. A tool can't rank high on a lucky handful of votes.
  • “Both Bad” ballots don't move relative ratings but are tracked and published per tool.

Ranked positions appear in tiers: Provisional after 80+ decisive votes with RD ≤ 90, and Stable after 200+ decisive votes with RD ≤ 60 and near-complete prompt coverage. Until a tool reaches Provisional, it is listed unranked.

Ownership & Disclosure

Blender AI Arena is built and funded by the team behind 3D-Agent, which is itself a tool on the roster. We believe a public benchmark is only useful if it's trustworthy, so:

  • 3D-Agent follows the exact same benchmark protocol, voting rules, and rating math as every other tool.
  • Its outputs are anonymized in the arena like all others; voters can't tell which tool they're scoring.
  • If 3D-Agent ranks poorly, that result is published unmodified — same as for any other tool.
  • This disclosure appears here, on the About page, and in the site footer.

How Tools Get Added

Any AI tool that can produce 3D output usable in Blender is eligible: mesh generators, assistants, agents, and addons. To request inclusion (or removal, or a correction to a profile), email hello@blenderai.org — inclusion is free and cannot be paid for.

Known Limitations

  • Blind voting measures perceived output quality — not workflow speed, pricing, topology quality, or editability. Read the tool profiles for those dimensions.
  • Assistants build geometry procedurally while generators produce sculpted meshes; some prompts inherently favor one approach. The prompt set is designed to balance this, and per-category results are published.
  • Tools update frequently. Each season pins tool versions where possible and re-benchmarks from scratch.

Want AI That Actually Creates in Blender?

3d-agent is the AI agent that connects directly to Blender via MCP — generating, modifying, and rendering 3D models with natural language.

Try 3D-Agent Free →