GPU is an intermediary solution

胡雪盐8 · 2025-11-03 12:11:48Z

GPU is an intermediary solution简介

来源: 胡雪盐8 于 2025-11-03 12:11:48 [档案] [博客] [旧帖] [给我悄悄话] 阅读数 : (9093 bytes)

You’re making a very sharp point — and it aligns with how senior architects at NVIDIA, Cerebras, and Google privately describe the situation:

GPUs are a stop-gap, not the end state.
They were the first massively parallel platform conveniently available when deep learning exploded — but their architecture is fundamentally a compromise.

Let me break down why GPUs are an interim solution and what the true “post-GPU” era will look like.

Why GPUs Are Only an Intermediary Solution

1. GPUs were designed for graphics, not AI

The core assumptions behind GPUs:

Embarrassing parallelism for pixel shading
Predictable memory access patterns
Small working sets compared to LLMs
Regular grids and textures, not trillion-parameter models

AI workloads break nearly all these assumptions.

2. GPUs still depend on the classical memory hierarchy

Even with HBM, GPUs still have:

Huge data movement overhead
Separate compute and memory
Power wasted shuttling weights

In modern LLM training:

>70% of energy is data movement, not math
Bandwidth, not FLOPs, is the limiting factor

This architecture is unsustainable as models scale to 10T–100T parameters.

3. Tensor cores are a bolt-on

Tensor cores are essentially a grafted-on matrix accelerator:

Not tightly integrated with the memory fabric
Still bottlenecked by HBM bandwidth
Still forced through CUDA, which adds overhead

They improve throughput but don’t fix the fundamental architectural mismatch.

4. GPUs scale poorly at cluster size

Large AI systems require:

Global synchronization
Fast model-parallel communication
Distributed memory structures

Even NVLink / NVSwitch clusters hit limits around the 10k–20k GPU scale:

Latency balloons
Interconnect becomes the bottleneck
Training efficiency drops massively

For trillion-scale models, GPUs are already the weak link.

What Comes After GPUs (The True Long-Term Architecture)

1. Compute-In-Memory (CIM / PIM)

Instead of moving data to compute:
move compute into memory.

This avoids the von Neumann bottleneck entirely.

Startups like Rain AI and Mythic are early proof points.

2. Wafer-scale engines (WSE)

Cerebras WSE-3 proves:

Giant monolithic silicon
All memory local
No multi-GPU communication
Full-model training on-die

This is much closer to the eventual direction than GPUs.

3. AI-native distributed memory systems

Think:

Unified global memory for the entire cluster
Hundreds of TB of accessible memory
Zero-copy weight sharing
This is where CXL and UCIe will converge.

4. Optical or analog compute

Optical neural networks promise:

Orders of magnitude lower energy per MAC
Natural support for matrix ops
Massive parallelism

This eliminates electrical resistance limits entirely.

5. Direct silicon photonics interconnect

Rather than GPU p2p networks:

Photonic mesh
Terabyte-class chip-to-chip bandwidth
Ultra-low latency

This is essential for training 100T-scale models.

更多我的博客文章>>>

您的位置：文学城 » 论坛 » 时事述评 » GPU is an intermediary solution

请您先登陆，再发跟帖！