ParaGrapher source code for accessing WebGraphs have been published. The supported graph types are:
PARAGRAPHER_CSX_WG_400_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex. Graphs in this category: LAW web graphs (https://law.di.unimi.it/datasets.php) .-
PARAGRAPHER_CSX_WG_404_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex and 4 Bytes integer weights per edge. Graphs in this category: MS-BioGraphs (https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs/). PARAGRAPHER_CSX_WG_800_AP: graphs compressed in Big WebGraph format with 8 Bytes ID per vertex. Graphs in this category: (i) WDC Hyper Link 2012 (https://webdatacommons.org/hyperlinkgraph/) and (ii) SWH graphs (https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html)
ParaGrapher uses its asynchronous and parallel API to implement these graph types. The user needs to implement a callback function that is called by the API upon completion of reading a block of edges. Poplar uses a shared memory for interaction between its C library and the Java library that deploys the WebGraph framework.
For further details, please refer to Poplar source code repository: https://github.com/DIPSA-QUB/ParaGrapher, particularly, src/webgraph.c and src/WG*.java files.
Related Posts
- ParaGrapher: A Parallel and Distributed Graph Loading Library for Large-Scale Compressed Graphs – BigData’25 (Short Paper)

- Accelerating Loading WebGraphs in ParaGrapher

- Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

- An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)

- ParaGrapher Integrated to LaganLighter

- ParaGrapher Source Code For WebGraph Types
