2 March 2026 – DIPSA: Data-Intensive Parallel Systems and Algorithms

Abstract Emerging AI accelerators increasingly adopt wafer-scale integration, combining hundreds of thousands of cores with massive on-chip memory and ultra-high bandwidth. Yet, existing LLM inference systems—designed primarily for GPUs—cannot fully exploit this architecture. In this talk, I will present WaferLLM, the first LLM inference system designed specifically for wafer-scale accelerators. WaferLLM […]

Talks

Daily archives: 2 March 2026

Invited Talk – Dr Luo Mai – Bringing LLM Inference to Wafer …