https://www.tomshardware.com/tech-industry/artificial-intelligence/deepseek-research-suggests-huaweis-ascend-910c-delivers-60-percent-nvidia-h100-inference-performance?utm_source=chatgpt.com
Ascend 910C vs NVIDIA H100 vs AMD MI300X
| Specification |
Huawei Ascend 910C |
NVIDIA H100 (SXM5) |
AMD MI300X |
| FP16 Performance |
800 TFLOPS |
989 TFLOPS (Sparsity: 1979) |
1307.4 TFLOPS (Sparsity: 2614.8) |
| INT8 Performance |
~1600 TOPS |
~1979 TOPS (Sparsity: 3958) |
2614.9 TOPS (Sparsity: 5229.8) |
| Memory |
128GB HBM3 |
80GB HBM3 |
192GB HBM3e |
| Memory Bandwidth |
3.2 TB/s |
3.35 TB/s |
5.3 TB/s |
| Power Consumption (TDP) |
~310W (potentially higher) |
Up to 700W |
750W |
| Software Ecosystem |
CANN, MindSpore, PyTorch, TensorFlow |
CUDA, cuDNN, TensorRT |
ROCm, HIP |
Note: NVIDIA and AMD often quote performance with sparsity features; dense compute figures are used for a more direct comparison where possible.
| Feature |
Huawei Ascend 920 (Claimed/Projected) |
NVIDIA H100 (SXM/PCIe) |
| Architecture |
Huawei Da Vinci Architecture (Chiplet-based) |
NVIDIA Hopper |
| Process Node |
SMIC 6nm (Projected) |
TSMC 4nm (Custom) |
| FP16/BF16 Compute |
900 TFLOPS (BF16, per card) |
1,513 TFLOPS (BF16, without sparsity) |
| FP8 Compute |
Not widely published/clear for 920, but its predecessor (910C) is lower. |
3,026 TFLOPS (FP8, without sparsity) |
| Memory Bandwidth |
4.0 TB/s (HBM3) |
3.35 - 3.9 TB/s (HBM3) |
| GPU Memory (VRAM) |
Likely high (The predecessor, 910C, has 128GB HBM3) |
80GB (HBM3) |
| Software Ecosystem |
CANN (Requires porting, less mature) |
CUDA (Industry standard, highly mature) |
| Primary Market |
China (Strong domestic focus) |
Global |