sequence similarity graphs – DIPSA: Data-Intensive Parallel Systems and Algorithms

Whereas the literature describes an increasing number of graph algorithms, loading graphs remains a time-consuming component of the end-to-end execution time. Graph frameworks often rely on custom graph storage formats, that are not optimized for efficient loading of large-scale graph datasets. Furthermore, graph loading is often not optimized as it […]

ParaGrapher

ParaGrapher: A Parallel and Distributed Graph Loading Library for Large-Scale …

PDF versionDOI: 10.48550/arXiv.2507.00716 ParaGrapher is a graph loading API and library that enables graph processing frameworks to load large-scale compressed graphs with minimal overhead. This capability accelerates the design and implementation of new high-performance graph algorithms and their evaluation on a wide range of graphs and across different frameworks. However, […]

ParaGrapher

Accelerating Loading WebGraphs in ParaGrapher

PDF versionDOI: 10.48550/arXiv.2404.19735 Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This […]

ParaGrapher

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – …

MS-BioGraph sequence similarity graph datasets are now publicly available on IEEE DataPort: https://doi.org/10.21227/gmd9-1534 . To access the files, you need to register/login to IEEE DataPort and then visit the MS-BioGraphs page. By saving the page as an HTML file such as dp.html, you may download the datasets (as an example […]

MS-BioGraphs

MS-BioGraphs on IEEE DataPort

ParaGrapher source code has been integrated to LaganLighter and access to different WebGraph formats are available in LaganLighter: For further details, please refer to – LaganLighter source coder Repository: https://github.com/DIPSA-QUB/LaganLighter, particularly, the graph.c file.– ParaGrapher source code repository: https://github.com/DIPSA-QUB/ParaGrapher particularly, the src/webgraph.c and src/WG*.java files. Read more about ParaGrapher and […]

LaganLighter ParaGrapher

ParaGrapher Integrated to LaganLighter

ParaGrapher source code for accessing WebGraphs have been published. The supported graph types are: ParaGrapher uses its asynchronous and parallel API to implement these graph types. The user needs to implement a callback function that is called by the API upon completion of reading a block of edges. Poplar uses […]

MS-BioGraphs ParaGrapher

ParaGrapher Source Code For WebGraph Types

2023 IEEE International Conference on Big Data (BigData’23)December 15-18, 2023, Sorrento, Italia DOI: 10.1109/BigData59044.2023.10386309PDF (Authors Copy) Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets. To ensure continuation of this progress, we (i) investigate […]

MS-BioGraphs

On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets …

2023 IEEE International Symposium on Workload Characterization (IISWC’23)October 1-3, 2023, Ghent, Belgium DOI: 10.1109/IISWC59245.2023.00029PDF Version Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets. In this paper, we announce publication of MS-BioGraphs, a new […]

MS-BioGraphs

Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – …

DOI: 10.48550/arXiv.2308.16744 PDF VersionarXiv Link Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets. To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as […]

MS-BioGraphs

MS-BioGraphs: Sequence Similarity Graph Datasets

Name MS-BioGraphs – MS URL https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS Download Link https://doi.org/10.21227/gmd9-1534 Script for Downloading All Files https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/ Validating and Sample Code https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/ Graph Explanation Vertices represent proteins and each edge represents the sequence similarity between its two endpoints Edge Weighted Yes Directed No Number of Vertices 1,757,323,526 Number of Edges 2,488,069,027,875 Maximum […]

MS-BioGraphs