High-Performance Graph Processing

We design computation- and memory-optimized scalable parallel and distributed graph algorithms and processing models. This research sits at the intersection between high-performance computing, computer architecture, and computer networks. Some of this work uses approximation, or transprecise computing.

MS-BioGraphs Datasets [2022 – 2023] are a family of trillion-scale public real-world graph datasets and the largest public real-world graphs at the time of publication (Aug. 2023). The datasets have been created by matching similarity between 1.7 billion proteins and contains up to 2.5 trillion edges.

Approximate Maximum Weighted Clique [2021 – ongoing] is the search for an efficient (polynomial time) algorithm to solve the maximum weighted clique problem while finding near-optimal weighted cliques.

Graptor [2019 – ongoing] enhances GraphGrind and VEBO by providing Single Instruction Multiple Data (SIMD) vectorization for graph analytics using the SSE, AVX2 and AVX-512 instruction sets families.

LaganLighter  [2019 – 2022]: Having the experience of GraphGrind and VEBO, we understood the requirement of re-studying graph algorithms based on the implications imposed by the structure of datasets into the execution of graph analytics. LaganLighter introduces structure-aware high-performance graph algorithms.

VEBO [2017 – 2018] provides load balancing and better CPU utilization by applying a new graph relabeling and partitioning algorithm. The assumption behind VEBO is that load balance is a more important consideration for shared-memory graph partitioning than memory locality. VEBO has been implemented for GraphGrind and also for Polymer.

GraphGrind [2014 – 2017] optimizes Ligra and Polymer for systems with Non-Uniform Memory-Architecture (NUMA) by adapting the parallelisation approach (vertices and edges partitioning and scheduling algorithms) to the graph algorithm, by compressing of topology data, and by introducing a new NUMA-aware work-stealing algorithm that extends Cilk.