Key Details of Poetiq's Achievement
- Poetiq's system scored 54% on the official ARC-AGI-2 semi-private evaluation. It is the first system to break the 50% barrier.
- Google's Gemini 3 Deep Think model previously held the top score with 45%.
- Poetiq's cost was approximately $30.57 per problem. The cost for Gemini 3 Deep Think was $77.16 per problem.
- Poetiq used a "meta-system" or orchestration layer. This open-source "scaffold" leverages and coordinates frontier models. These include Gemini 3 Pro and GPT-5.1. The system solves problems through testing ideas, feedback, and refining the solution.
- This result shows that AI reasoning breakthroughs can be achieved through system design and orchestration of existing models. It does not require only expensive model training. The ARC-AGI-2 benchmark measures pattern recognition and fluid intelligence.