Abstract Emerging AI accelerators increasingly adopt wafer-scale integration, combining hundreds of thousands of cores with massive on-chip memory and ultra-high bandwidth. Yet, existing LLM inference systems—designed primarily for GPUs—cannot fully exploit this architecture. In this talk, I will present WaferLLM, the first LLM inference system designed specifically for wafer-scale accelerators. WaferLLM […]
Daily archives: 2 March 2026
1 post