ParaGrapher: Graph Loading API and Library | DIPSA: Data-Intensive Parallel Systems and Algorithms

Project Statement

Several graph frameworks exist and each one optimizes particular graph algorithms for different optimization metrics and is implemented in a different programming language using different parallelization libraries/techniques. As each framework has implemented its own format of graphs, it becomes time-consuming, inaccurate, and sometimes impossible to compare the execution of graph algorithms across different frameworks for a wide range of graph datasets. Moreover, designing a new graph framework requires investing time for implementing solutions for loading the graphs (as input datasets to the algorithms), while it is the execution of graph algorithms that is the main optimization target of the graph frameworks and not loading the graphs.

It may be better to separate the process of loading graphs from designing high-performance graph algorithms. We present ParaGrapher, an API and a library for loading graphs. ParaGrapher supports (i) synchronous (blocking) and asynchronous (non-blocking) loading of (ii) simple, vertex- and edge-weighted graphs (iii) in different formats including WebGraph and MatrixMarket. In this way, ParaGrapher helps progressing High-Performance Graph Processing by simplifying the evaluation and refinement of new (and previous) falsifiable contributions on a wider range of datasets.

Source Code

https://github.com/DIPSA-QUB/ParaGrapher

API Documentation

Please refer to the Wiki, https://github.com/DIPSA-QUB/ParaGrapher/wiki/API-Documentation, or download the PDF file using https://github.com/DIPSA-QUB/ParaGrapher/raw/main/doc/api.pdf .

ParaGrapher in a Few Bullet Points

ParaGrapher
(i) provides an API for reading different graph formats from storage and
(ii) implements a library for this API for accessing graphs.

This allows new graph processing frameworks
(i) to have immediate access for reading graphs and
(ii) to evaluate the new/previous contributions on a wide range of graph datasets.

Asynchronous reading of a graph in ParaGrapher:
(i) the user asks reading a range of vertices (and their edges),
(ii) ParaGrapher calls the callback function defined by the user upon completing each block of requested vertices, and
(iii) the user (in callback function) informs ParaGrapher when no further access to a block (i.e., its buffer) is required.

ParaGrapher library
(i) is not responsible for allocating, releasing, or managing memory and is just responsible for loading block(s) of edges,
(ii) returns the read data as read-only buffers to the user or by populating memory allocated by the user, and
(iii) parallelizes reading and passes blocks of read data to a user-defined callback function on a new thread.

For now, ParaGrapher supports
(i) WebGraph format by implementing a C-consumser Java-producer algorithm using the shared-memory interface (/dev/shm). For more details please refer to its post.
(ii) Other formats: on its way.

Publications

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

Project Members

– Mohsen Koohi Esfahani
– Hans Vandierendonck
– Syed Ibtisam Tauhidi
– Marco D’Antonio

Grants and Funding

– Horizon Europe under grant agreement 101072456 (RELAX)
– The Engineering and Physical Sciences Research Council under grant agreement EP/X01794X/1 (ASCCED), EP/X029174/1 (RELAX), EP/Z531054/1 (the Kelvin Living Lab), and EP/T022175/1 (Kelvin-2 Tier-2 HPC)
– A grant from the Ministry of Education, India, under a collaborative project between Tezpur University, India and QUB

Acknowledgements

We are grateful to
– Tony Lindsay, HPDC cluster, EEECS, QUB
– Vaughan Purnell, Head of NI-HPC and his team