QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique – Euro-Par 2024

Posted on 15 May 2024 by Mohsen Koohi Esfahani

30th International European Conference on Parallel and Distributed Computing (Euro-Par 2024)

Abstract

The Maximum Weighted Clique(MWC) problem remains challenging due to its unfavourable time complexity.In this paper, we analyze the execution of exact search-based MWC algorithms and show that high-accuracy weighted cliques can be discovered in the early stages of the execution if searching the combinatorial space is performed systematically.

Based on this observation, we introduce QClique as an approximate MWC algorithm that processes the search space as long as better cliques are expected. QClique uses a tunable parameter to trade-off between accuracy vs. execution time and delivers 4.7-$82.3 time speedup in comparison to previous state-of-the-art MWC algorithms while providing 91.4% accuracy and achieves a parallel speedup of up to 56x on 128 threads.

Additionally, QClique accelerates the exact MWC computation by replacing the initial clique of the exact algorithm. For WLMC, an exact state-of-the-art MWC algorithm, this results in 3.3x on average.

Code

https://github.com/DIPSA-QUB/QClique

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

Posted on 1 May 2024 by Mohsen Koohi Esfahani

PDF version
DOI: 10.48550/arXiv.2404.19735

Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks.

To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types.

Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats.

ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

BibTex

@misc{paragrapher-arxiv,
  title = { Selective Parallel Loading of Large-Scale 
            Compressed Graphs with ParaGrapher}, 
  author = { {Mohsen} {Koohi Esfahani} and Marco D'Antonio and 
             Syed Ibtisam Tauhidi and Thai Son Mai and 
             Hans Vandierendonck},
  year = {2024},
  eprint = {2404.19735},
  archivePrefix = {arXiv},
  primaryClass = {cs.AR},
  doi = {10.48550/arXiv.2404.19735}
}

Related Posts & Source Code

ParaGrapher Web Page

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)

Posted on 20 April 2024 by Mohsen Koohi Esfahani

Short URL of this post: https://blogs.qub.ac.uk/DIPSA/HDD-vs-SSD-vs-LustreFS-2024

We evaluate read bandwidth of three storage types:

HDD: A 6TB Hitachi HUS726060AL 7200RPM SATA v3.1
SSD: A 4TB Samsung MZQL23T8HCLS-00A07 PCIe4 NVMe v1.4
LustreFS: A parallel file system with total 2PB with a SSD pool

and for three parallel read methods:

mmap: https://man7.org/linux/man-pages/man2/mmap.2.html
pread: https://man7.org/linux/man-pages/man2/pread.2.html
read: https://man7.org/linux/man-pages/man2/read.2.html

and for two block sizes:

4 KB blocks
4 MB blocks

The source code is available on ParaGrapher repository:

https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth.c
https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth_mam.c : this file is similar to previous one, but repeats each evaluation for a user-defined number of rounds and identifies Min, Average, and Max. values.

The OS cache of storage contents have been dropped after each evaluation
(sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches').
The flushcache.c file (https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/flushcache.c) can be used with the same functionality for users without sudo access, however, it usually takes more time to be finished.

For LustreFS, we have repeated the evaluation of read and pread using O_DIRECT flag as this flag prevents client-side caching.

For HDD and SSD experiments, we have used a machine with Intel W-2295 3.00GHz CPU, 18 cores, 36 hyper-threads, 24MB L3 cache, 256 GB DDR4 2933Mhz memory, running Debian 12 Linux 6.1. For LustreFS, we have used a machine with 2TB 3.2GHz DDR4 memory, 2 AMD 7702 CPUs, in total, 128 cores, 256 threads.

The results of the evaluation using read_bandwidth.c are in the following table. The values are Bandwidth in MB/s. Also, 1-2 digits close to each number with a white background are are percentage of load imbalance between parallel threads.

Please click on the image to expand.

C vs. Java

We measure the bandwidth of SSD and HDD in C (mmap and pread) vs. Java (mmap and read). We use a machine with Intel W-2295 3.00GHz CPU, 18 cores, 36 hyper-threads, 24MB L3 cache, 256 GB DDR4 2933Mhz memory, running Debian 12 Linux 6.1 and the following codes:

https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth.c
https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/ReadBandwidth.java : this is a Java-based evaluation of read bandwidth and the script (https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/java-read-bandwidth.sh) can be used to create changes in evaluation parameters.

The results are in the following.

Technical Posts

An (Incomplete) List of Publicly Available Graph Datasets/Generators21 June 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
SIMD Bit Twiddling Hacks25 November 2023
LaganLighter Source Code14 November 2022

ParaGrapher

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

Brian Dandurand is offered a Marie Curie Individual Fellowship

Posted on 17 February 2024 by Hans Vandierendonck

Congratulations to Brian Dandurand who has received notification that his Individual Fellowship proposal entitled “Scaling Parallelism and Convexity Hurdles in Bi-Level Machine Learning” has been proposed for funding.

More details will follow in due time.

ParaGrapher Integrated to LaganLighter

Posted on 16 February 2024 by Mohsen Koohi Esfahani

Poplar source code has been integrated to LaganLighter and access to different WebGraph formats are available in LaganLighter:

PARAGRAPHER_CSX_WG_400_AP
PARAGRAPHER_CSX_WG_404_AP
PARAGRAPHER_CSX_WG_800_AP

For further details, please refer to
– LaganLighter source coder Repository: https://github.com/DIPSA-QUB/LaganLighter, particularly, the graph.c file.
– ParaGrapher source code repository: https://github.com/DIPSA-QUB/ParaGrapher particularly, the src/webgraph.c and src/WG*.java files.

Read more about ParaGrapher and LaganLighter.

Related Posts

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

ParaGrapher Source Code For WebGraph Types

Posted on 16 February 2024 by Mohsen Koohi Esfahani

ParaGrapher source code for accessing WebGraphs have been published. The supported graph types are:

PARAGRAPHER_CSX_WG_400_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex. Graphs in this category: LAW web graphs (https://law.di.unimi.it/datasets.php) .
PARAGRAPHER_CSX_WG_404_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex and 4 Bytes integer weights per edge. Graphs in this category: MS-BioGraphs (https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs/).
PARAGRAPHER_CSX_WG_800_AP: graphs compressed in Big WebGraph format with 8 Bytes ID per vertex. Graphs in this category: (i) WDC Hyper Link 2012 (https://webdatacommons.org/hyperlinkgraph/) and (ii) SWH graphs (https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html)

ParaGrapher uses its asynchronous and parallel API to implement these graph types. The user needs to implement a callback function that is called by the API upon completion of reading a block of edges. Poplar uses a shared memory for interaction between its C library and the Java library that deploys the WebGraph framework.

For further details, please refer to Poplar source code repository: https://github.com/DIPSA-QUB/ParaGrapher, particularly, src/webgraph.c and src/WG*.java files.

ParaGrapher

Related Posts

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)

Posted on 15 December 2023 by Mohsen Koohi Esfahani

2023 IEEE International Conference on Big Data (BigData’23)
December 15-18, 2023, Sorrento, Italia

DOI: 10.1109/BigData59044.2023.10386309
PDF (Authors Copy)

To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creating MS-BioGraphs, a new family of publicly available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently. The largest graph is created by matching (i.e., all-to-all similarity aligning) 1.7 billion protein sequences. The MS-BioGraphs family includes also seven subgraphs with different sizes and direction types.

We describe two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique.

The datasets are available online on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs.

BibTex

@INPROCEEDINGS{10.1109/BigData59044.2023.10386309,
   author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick,  Peter and  Vigna, Sebastiano},  
  booktitle={2023 IEEE International Conference on Big Data (BigData'23)},  
  title={On Overcoming {HPC} Challenges of  Trillion-Scale Real-World Graph Datasets}, 
  year={2023},
  volume={},
  number={},
  pages={},
  location={Italia, Sorrento},
  publisher={IEEE Computer Society},
  doi={10.1109/BigData59044.2023.10386309}
}

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)

Posted on 2 October 2023 by Mohsen Koohi Esfahani

2023 IEEE International Symposium on Workload Characterization (IISWC’23)
October 1-3, 2023, Ghent, Belgium

DOI: 10.1109/IISWC59245.2023.00029
PDF Version

Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.

In this paper, we announce publication of MS-BioGraphs, a new family of publicly-available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently.

We briefly review the two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We also study some characteristics of MS-BioGraphs.

The datasets are available on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs .

Please visit https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Sequence-Similarity-Graph-Datasets/ for a complete version of this paper.

Bibtex

@INPROCEEDINGS{10.1109/IISWC59245.2023.00029,
   author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick,  Peter and  Vigna, Sebastiano},  
  booktitle={2023 IEEE International Symposium on Workload Characterization (IISWC'23)},  
  title={Dataset Announcement: {MS-BioGraphs}, Trillion-Scale Public Real-World Sequence Similarity Graphs}, 
  year={2023},
  volume={},
  number={},
  pages={},
  location={Belgium, Ghent},
  publisher={IEEE Computer Society},
  doi={10.1109/IISWC59245.2023.00029}
}

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

MS-BioGraphs: Sequence Similarity Graph Datasets

Posted on 30 August 2023 by Mohsen Koohi Esfahani

DOI: 10.48550/arXiv.2308.16744

PDF Version
arXiv Link

Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.

To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creating MS-BioGraphs, a new family of publicly available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently. The largest graph is created by matching (i.e., all-to-all similarity aligning) 1.7 billion protein sequences. The MS-BioGraphs family includes also seven subgraphs with different sizes and direction types.

We describe two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We present a comparative study of structural characteristics of MS-BioGraphs.

The datasets are available online on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs .

BibTex

@article{MS-BioGraphs-arxiv,
    title = {{MS-BioGraphs}: Sequence Similarity Graph Datasets},
    author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick, Peter and Vigna, Sebastiano},
    year = 2023,
    journal = {CoRR},
    volume = {abs/2308.16744},
    doi = {10.48550/arXiv.2308.16744},
    url = {https://doi.org/10.48550/arXiv.2308.16744},
    archiveprefix = {arXiv},
    eprint = {2308.16744}
}

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

MS-BioGraphs MS

Posted on 10 August 2023 by Mohsen Koohi Esfahani

Name	MS-BioGraphs – MS
URL	https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS
Download Link	https://doi.org/10.21227/gmd9-1534
Script for Downloading All Files	https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Code	https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph Explanation	Vertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge Weighted	Yes
Directed	No
Number of Vertices	1,757,323,526
Number of Edges	2,488,069,027,875
Maximum Degree	814,957
Minimum Weight	98
Maximum Weight	634,925
Number of Zero-Degree Vertices	6,437,984
Average Degree	1,415.8
Size of The Largest WCC	2,486,890,448,664
Number of WCC	148,861,367
Creation Details	MS-BioGraphs: Sequency Similarity Graph Datasets
Format	WebGraph
License	CC BY-NC-SA
QUB IDF	2223-052
DOI	10.5281/zenodo.7820808
Citation	Mohsen Koohi Esfahani, Sebastiano Vigna, Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, "MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex	@data{gmd9-1534-24, doi = {10.21227/gmd9-1534}, url = {https://doi.org/10.21227/gmd9-1534}, author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick, Peter}, publisher = {IEEE Dataport}, title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets}, year = {2024} }

Files

Underlying Graph	The underlying graph in WebGraph format: File: MS-underlying.graph, Size: 7,342,853,446,646 Bytes File: MS-underlying.offsets, Size: 5,341,385,503 Bytes File: MS-underlying.properties, Size: 1,560 Bytes Total Size: 7,348,194,833,709 Bytes These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels)	The weights of the graph in WebGraph format: File: MS-weights.labels, Size: 5,037,171,681,279 Bytes File: MS-weights.labeloffsets, Size: 5,070,752,590 Bytes File: MS-weights.properties, Size: 183 Bytes Total Size: 5,042,242,434,052 Bytes These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text)	This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights. The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation. Name: MS_edges_shas.txt Size: 4,449,360 Bytes SHASUM: 85d5b0896f8fa8a2b490ec6560937c45ced8b0d9
Offsets (Binary)	The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of \|V\|+1 8-Bytes elements. The first and last values are 0 and \|E\|, respectively. This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges. Name: MS_offsets.bin Size: 14,058,588,216 Bytes SHASUM: 15c3defdbb92f7b1fe48a3fb20530d99fa30c616
WCC (Binary)	The Weakly-Connected Compontent (WCC) array in binary format and little endian order. This array consists of \|V\| 4-Bytes elements The vertices in the same component have the same values in the WCC array. Name: MS-wcc.bin Size: 7,029,294,104 Bytes SHASUM: 30f12b738dde8f62aecb94239796b169512e6710
Names (tar.gz)	This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence. Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file. Name: names.tar.gz Size: 27,130,045,933 Bytes SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON	The charactersitics of the graph and shasums of the files. It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser. Name: MS.ojson Size: 700 Bytes SHASUM: e2eb3fcdd0c22838971ed2edea8e1ed081a77282

Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

DIPSA: Data-Intensive Parallel Systems and Algorithms

Tag Archives: high performance computing

QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique – Euro-Par 2024

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)

C vs. Java

Brian Dandurand is offered a Marie Curie Individual Fellowship

ParaGrapher Integrated to LaganLighter

ParaGrapher Source Code For WebGraph Types

On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)

Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)

MS-BioGraphs: Sequence Similarity Graph Datasets

MS-BioGraphs MS

Files

Plots