ParaGrapher: Graph Loading API and Library

Project Statement

Several graph frameworks exist and each one optimizes particular graph algorithms for different optimization metrics and is implemented in a different programming language using different parallelization libraries/techniques. As each framework has implemented its own format of graphs, it becomes time-consuming, inaccurate, and sometimes impossible to compare the execution of graph algorithms across different frameworks for a wide range of graph datasets. Moreover, designing a new graph framework requires investing time for implementing solutions for loading the graphs (as input datasets to the algorithms), while it is the execution of graph algorithms that is the main optimization target of the graph frameworks and not loading the graphs.

It seems necessary to separate the process of loading graphs from designing high-performance graph algorithms. We present ParaGrapher, an API and a library for loading graphs. ParaGrapher supports (i) synchronous (blocking) and asynchronous (non-blocking) loading of (ii) simple, vertex- and edge-weighted graphs (iii) in different formats including WebGraph and MatrixMarket. In this way, ParaGrapher keeps designing graph formats for graph dataset publishers and helps the graph processing frameworks to concentrate on processing data. We also hope ParaGrapher will help progressing High-Performance Graph Processing by simplifying the evaluation and refinement of new (and previous) falsifiable contributions on a wider range of datasets.

ParaGrapher in a Few Bullet Points

(i) provides an API for reading different graph formats from storage and
(ii) implements a library for this API for accessing graphs.

This allows new graph processing frameworks
(i) to have immediate access for reading graphs and
(ii) to evaluate the new/previous contributions on a wide range of graph datasets.

Asynchronous reading of a graph in ParaGrapher:
(i) the user asks reading a range of vertices (and their edges),
(ii) ParaGrapher calls the callback function defined by the user upon completing each block of requested vertices, and
(iii) the user (in callback function) informs ParaGrapher when no further access to a block (i.e., its buffer) is required.

ParaGrapher library
(i) is not responsible for allocating, releasing, or managing memory and is just responsible for loading block(s) of edges,
(ii) returns the read data as read-only buffers to the user or by populating memory allocated by the user, and
(iii) parallelizes reading and passes blocks of read data to a user-defined callback function on a new thread.

For now, ParaGrapher supports
(i) WebGraph format by implementing a C-consumser Java-producer algorithm using the shared-memory interface (/dev/shm). For more details please refer to its post.
(ii) Other formats: on its way.

Source Code


Project Members

Mohsen Koohi Esfahani
Hans Vandierendonck
– Syed Ibtisam Tauhidi
– Marco D’Antonio
Mai Thai Son