ParaGrapher: Graph Loading API and Library

Project Statement

Several graph frameworks exist and each one optimizes particular graph algorithms for different optimization metrics and is implemented in a different programming language using different parallelization libraries/techniques. As each framework has implemented its own format of graphs, it becomes time-consuming, inaccurate, and sometimes impossible to compare the execution of graph algorithms across different frameworks for a wide range of graph datasets. Moreover, designing a new graph framework requires investing time for implementing solutions for loading the graphs (as input datasets to the algorithms), while it is the execution of graph algorithms that is the main optimization target of the graph frameworks and not loading the graphs.

It may be better to separate the process of loading graphs from designing high-performance graph algorithms. We present ParaGrapher, an API and a library for loading graphs. ParaGrapher supports (i) synchronous (blocking) and asynchronous (non-blocking) loading of (ii) simple, vertex- and edge-weighted graphs (iii) in different formats including WebGraph and MatrixMarket. In this way, ParaGrapher helps progressing High-Performance Graph Processing by simplifying the evaluation and refinement of new (and previous) falsifiable contributions on a wider range of datasets.

Source Code

https://github.com/DIPSA-QUB/ParaGrapher

API Documentation

Please refer to the Wiki, https://github.com/DIPSA-QUB/ParaGrapher/wiki/API-Documentation, or download the PDF file using https://github.com/DIPSA-QUB/ParaGrapher/raw/main/doc/api.pdf .

ParaGrapher in a Few Bullet Points

ParaGrapher
(i) provides an API for reading different graph formats from storage and
(ii) implements a library for this API for accessing graphs.

This allows new graph processing frameworks
(i) to have immediate access for reading graphs and
(ii) to evaluate the new/previous contributions on a wide range of graph datasets.

Asynchronous reading of a graph in ParaGrapher:
(i) the user asks reading a range of vertices (and their edges),
(ii) ParaGrapher calls the callback function defined by the user upon completing each block of requested vertices, and
(iii) the user (in callback function) informs ParaGrapher when no further access to a block (i.e., its buffer) is required.

ParaGrapher library
(i) is not responsible for allocating, releasing, or managing memory and is just responsible for loading block(s) of edges,
(ii) returns the read data as read-only buffers to the user or by populating memory allocated by the user, and
(iii) parallelizes reading and passes blocks of read data to a user-defined callback function on a new thread.

For now, ParaGrapher supports
(i) WebGraph format by implementing a C-consumser Java-producer algorithm using the shared-memory interface (/dev/shm). For more details please refer to its post.
(ii) Other formats: on its way.

Publications


Project Members

Mohsen Koohi Esfahani
Hans Vandierendonck
Syed Ibtisam Tauhidi
Marco D’Antonio

Grants and Funding

– Horizon Europe under grant agreement 101072456 (RELAX)
– The Engineering and Physical Sciences Research Council under grant agreement EP/X01794X/1 (ASCCED), EP/X029174/1 (RELAX), EP/Z531054/1 (the Kelvin Living Lab), and EP/T022175/1 (Kelvin-2 Tier-2 HPC)
– A grant from the Ministry of Education, India, under a collaborative project between Tezpur University, India and QUB
– PhD scholarship from The Department for the Economy, Northern Ireland and QUB

Acknowledgements

We are grateful to
– Tony Lindsay, HPDC cluster, EEECS, QUB
– Vaughan Purnell, Head of NI-HPC and his team