Nvidia, AMD, Intel GPU

来源: study169 于 2024-02-22 17:40:13 [档案] [旧帖] [给我悄悄话] 阅读数 : (13473 bytes)

Nvidia GPU从性能到开发生态环境绝对领先。但性能到底差多少？网上搜了一下：

Under the right circumstances, we found that Gaudi 2 had the highest LLM training performance vs. the same-generation NVIDIA A100 and AMD MI250 GPUs, with an average speedup of 1.22x vs. the A100-80GB, 1.34x vs. the A100-40GB, and 1.59x vs. the MI250.

On each platform, we ran the same training scripts from LLM Foundry using MPT models with a sequence length of 2048, BF16 mixed precision, and the ZeRO Stage-3 distributed training algorithm. On NVIDIA or AMD systems, this algorithm is implemented via PyTorch FSDP with sharding_strategy: FULL_SHARD. On Intel systems, this is currently done via DeepSpeed ZeRO with Stage: 3 but FSDP support is expected to be added in the near future.

On each system, we also used the most optimized implementation of scaled-dot-product-attention (SDPA) available:

NVIDIA: Triton FlashAttention-2
AMD: ROCm ComposableKernel FlashAttention-2
Intel: Gaudi TPC FusedSDPA

编程工具好像PyTorch最流行，官方版本支持Nvidia和AMD，intel好像有一个改动版支持Gaudi。在LLM早期军备竞赛阶段，大公司优先考虑性能最好的。在基础模型成熟以后，更多finetuning和domain adaptation，有多大必要抢最好的GPU。我们1～2千人AI 研发公司，最常用的GPU是A5000。感觉Nvidia还是有些远忧，希望老黄能将Nvidia带到更成功的AI应用领域。从Nvidia的发家史中可以看出老黄的长远眼光。

您的位置：文学城 » 论坛 » 大千股坛 » Nvidia, AMD, Intel GPU

请您先登陆，再发跟帖！