Poetiq surpasses Google's Gemini 3 Deep Think model

回答: Poetiq tops ARC-AGI-2 with Geminicedartree2025-12-08 08:51:30
Key Details of Poetiq's Achievement
  • Poetiq's system scored 54% on the official ARC-AGI-2 semi-private evaluation. It is the first system to break the 50% barrier.
  • Google's Gemini 3 Deep Think model previously held the top score with 45%.
  • Poetiq's cost was approximately $30.57 per problem. The cost for Gemini 3 Deep Think was $77.16 per problem.
  • Poetiq used a "meta-system" or orchestration layer. This open-source "scaffold" leverages and coordinates frontier models. These include Gemini 3 Pro and GPT-5.1. The system solves problems through testing ideas, feedback, and refining the solution.
  • This result shows that AI reasoning breakthroughs can be achieved through system design and orchestration of existing models. It does not require only expensive model training. The ARC-AGI-2 benchmark measures pattern recognition and fluid intelligence. 
请您先登陆,再发跟帖!