2023 IEEE International Symposium on Workload Characterization (IISWC’23)
October 1-3, 2023, Ghent, Belgium
DOI: 10.1109/IISWC59245.2023.00029
PDF Version
Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.
In this paper, we announce publication of MS-BioGraphs, a new family of publicly-available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently.
We briefly review the two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We also study some characteristics of MS-BioGraphs.
The datasets are available on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs .
Please visit https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Sequence-Similarity-Graph-Datasets/ for a complete version of this paper.
Bibtex
@INPROCEEDINGS{10.1109/IISWC59245.2023.00029,
author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick, Peter and Vigna, Sebastiano},
booktitle={2023 IEEE International Symposium on Workload Characterization (IISWC'23)},
title={Dataset Announcement: {MS-BioGraphs}, Trillion-Scale Public Real-World Sequence Similarity Graphs},
year={2023},
volume={},
number={},
pages={},
location={Belgium, Ghent},
publisher={IEEE Computer Society},
doi={10.1109/IISWC59245.2023.00029}
}
MS-BioGraphs
Related Posts
- Minimum Spanning Forest of MS-BioGraphs
- MS-BioGraphs on IEEE DataPort
- ParaGrapher Source Code For WebGraph Types
- On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)
- Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)
- MS-BioGraphs: Sequence Similarity Graph Datasets
- MS-BioGraphs MS
- MS-BioGraphs MSA500
- MS-BioGraphs MS200
- MS-BioGraphs MSA200
- MS-BioGraphs MS50
- MS-BioGraphs MSA50
- MS-BioGraphs MSA10
- MS-BioGraphs MS1
- MS-BioGraphs Validation