Poetiq surpasses Google's Gemini 3 Deep Think model

来源: cedartree 于 2025-12-08 12:05:14 [档案] [旧帖] [给我悄悄话] 阅读数 : (4719 bytes)

回答: Poetiq tops ARC-AGI-2 with Gemini 由 cedartree 于 2025-12-08 08:51:30

Key Details of Poetiq's Achievement

Poetiq's system scored 54% on the official ARC-AGI-2 semi-private evaluation. It is the first system to break the 50% barrier.
Google's Gemini 3 Deep Think model previously held the top score with 45%.
Poetiq's cost was approximately $30.57 per problem. The cost for Gemini 3 Deep Think was $77.16 per problem.
Poetiq used a "meta-system" or orchestration layer. This open-source "scaffold" leverages and coordinates frontier models. These include Gemini 3 Pro and GPT-5.1. The system solves problems through testing ideas, feedback, and refining the solution.
This result shows that AI reasoning breakthroughs can be achieved through system design and orchestration of existing models. It does not require only expensive model training. The ARC-AGI-2 benchmark measures pattern recognition and fluid intelligence.

您的位置：文学城 » 论坛 » 财富智汇 » Poetiq surpasses Google's Gemini 3 Deep Think model