OpenAI o3和DeepSeek R1在数学benchmarks的成绩比较:

本帖于 2025-02-01 22:47:23 时间, 由普通用户 大观园的贾探春 编辑

ChatGPT的回答:

Here's the comparison of math-related benchmark performance between OpenAI o3 and DeepSeek R1, excluding any benchmarks where one of the models’ numbers is unknown:

  • AIME 2024:

    • OpenAI o3: 96.7% accuracy
    • DeepSeek R1: 79.8%
  • Graduate-Level Google-Proof Q&A (GPQA) Benchmark:

    • OpenAI o3: 87.7%
    • DeepSeek R1: 71.5%

The MATH-500 benchmark data is only available for DeepSeek R1, so no comparison is made for that benchmark.

 

所有跟帖: 

请您先登陆,再发跟帖!