QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique – Euro-Par 2024

Posted on 15 May 2024 by Mohsen Koohi Esfahani

30th International European Conference on Parallel and Distributed Computing (Euro-Par 2024)

Abstract

The Maximum Weighted Clique(MWC) problem remains challenging due to its unfavourable time complexity.In this paper, we analyze the execution of exact search-based MWC algorithms and show that high-accuracy weighted cliques can be discovered in the early stages of the execution if searching the combinatorial space is performed systematically.

Based on this observation, we introduce QClique as an approximate MWC algorithm that processes the search space as long as better cliques are expected. QClique uses a tunable parameter to trade-off between accuracy vs. execution time and delivers 4.7-$82.3 time speedup in comparison to previous state-of-the-art MWC algorithms while providing 91.4% accuracy and achieves a parallel speedup of up to 56x on 128 threads.

Additionally, QClique accelerates the exact MWC computation by replacing the initial clique of the exact algorithm. For WLMC, an exact state-of-the-art MWC algorithm, this results in 3.3x on average.

Code

https://github.com/DIPSA-QUB/QClique

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

Posted on 1 May 2024 by Mohsen Koohi Esfahani

PDF version
DOI: 10.48550/arXiv.2404.19735

Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks.

To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types.

Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats.

ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

BibTex

@misc{paragrapher-arxiv,
  title = { Selective Parallel Loading of Large-Scale 
            Compressed Graphs with ParaGrapher}, 
  author = { {Mohsen} {Koohi Esfahani} and Marco D'Antonio and 
             Syed Ibtisam Tauhidi and Thai Son Mai and 
             Hans Vandierendonck},
  year = {2024},
  eprint = {2404.19735},
  archivePrefix = {arXiv},
  primaryClass = {cs.AR},
  doi = {10.48550/arXiv.2404.19735}
}

Related Posts & Source Code

ParaGrapher Web Page

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)

Posted on 20 April 2024 by Mohsen Koohi Esfahani

Short URL of this post: https://blogs.qub.ac.uk/DIPSA/HDD-vs-SSD-vs-LustreFS-2024

We evaluate read bandwidth of three storage types:

HDD: A 6TB Hitachi HUS726060AL 7200RPM SATA v3.1
SSD: A 4TB Samsung MZQL23T8HCLS-00A07 PCIe4 NVMe v1.4
LustreFS: A parallel file system with total 2PB with a SSD pool

and for three parallel read methods:

mmap: https://man7.org/linux/man-pages/man2/mmap.2.html
pread: https://man7.org/linux/man-pages/man2/pread.2.html
read: https://man7.org/linux/man-pages/man2/read.2.html

and for two block sizes:

4 KB blocks
4 MB blocks

The source code is available on ParaGrapher repository:

https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth.c
https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth_mam.c : this file is similar to previous one, but repeats each evaluation for a user-defined number of rounds and identifies Min, Average, and Max. values.

The OS cache of storage contents have been dropped after each evaluation
(sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches').
The flushcache.c file (https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/flushcache.c) can be used with the same functionality for users without sudo access, however, it usually takes more time to be finished.

For LustreFS, we have repeated the evaluation of read and pread using O_DIRECT flag as this flag prevents client-side caching.

For HDD and SSD experiments, we have used a machine with Intel W-2295 3.00GHz CPU, 18 cores, 36 hyper-threads, 24MB L3 cache, 256 GB DDR4 2933Mhz memory, running Debian 12 Linux 6.1. For LustreFS, we have used a machine with 2TB 3.2GHz DDR4 memory, 2 AMD 7702 CPUs, in total, 128 cores, 256 threads.

The results of the evaluation using read_bandwidth.c are in the following table. The values are Bandwidth in MB/s. Also, 1-2 digits close to each number with a white background are are percentage of load imbalance between parallel threads.

Please click on the image to expand.

C vs. Java

We measure the bandwidth of SSD and HDD in C (mmap and pread) vs. Java (mmap and read). We use a machine with Intel W-2295 3.00GHz CPU, 18 cores, 36 hyper-threads, 24MB L3 cache, 256 GB DDR4 2933Mhz memory, running Debian 12 Linux 6.1 and the following codes:

https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/read_bandwidth.c
https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/ReadBandwidth.java : this is a Java-based evaluation of read bandwidth and the script (https://github.com/DIPSA-QUB/ParaGrapher/blob/main/test/java-read-bandwidth.sh) can be used to create changes in evaluation parameters.

The results are in the following.

Technical Posts

An (Incomplete) List of Publicly Available Graph Datasets/Generators21 June 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
SIMD Bit Twiddling Hacks25 November 2023
LaganLighter Source Code14 November 2022

ParaGrapher

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

MS-BioGraphs on IEEE DataPort

Posted on 17 April 2024 by Mohsen Koohi Esfahani

MS-BioGraph sequence similarity graph datasets are now publicly available on IEEE DataPort: https://doi.org/10.21227/gmd9-1534.

To access the files, you need to register/login to IEEE DataPort and then visit the MS-BioGraphs page. By saving the page as an HTML file such as dp.html, you may download the datasets (as an example MS1) using the following script:

dsname="MS1"
html_file="dp.html"

urls=`cat $html_file  | sed  -e 's/\&amp;/\&/g'  | grep -Eo "(http|https)://[a-zA-Z0-9./?&=_%:-]*" | grep amazonaws  | sort | uniq | grep -E "$dsname[-_\.]"`

for u in $urls; do
    wget $u
    if [ $? != 0 ]; then break; fi
done

# removing query strings
for f in $(find $1 -type f); do
    if [ $f = ${f%%\?*} ]; then continue; fi
    mv "${f}" "${f%%\?*}"
done

# liking offsets.bin to be found by ParaGrapher
ln -s ${dsname}_offsets.bin ${dsname}-underlying_offsets.bin

Instead of wget you may use axel -n 10 to use multiple connections (here, 10) for downloading each file (https://manpages.ubuntu.com/manpages/noble/en/man1/axel.1.html).

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

ParaGrapher Integrated to LaganLighter

Posted on 16 February 2024 by Mohsen Koohi Esfahani

Poplar source code has been integrated to LaganLighter and access to different WebGraph formats are available in LaganLighter:

PARAGRAPHER_CSX_WG_400_AP
PARAGRAPHER_CSX_WG_404_AP
PARAGRAPHER_CSX_WG_800_AP

For further details, please refer to
– LaganLighter source coder Repository: https://github.com/DIPSA-QUB/LaganLighter, particularly, the graph.c file.
– ParaGrapher source code repository: https://github.com/DIPSA-QUB/ParaGrapher particularly, the src/webgraph.c and src/WG*.java files.

Read more about ParaGrapher and LaganLighter.

Related Posts

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

ParaGrapher Source Code For WebGraph Types

Posted on 16 February 2024 by Mohsen Koohi Esfahani

ParaGrapher source code for accessing WebGraphs have been published. The supported graph types are:

PARAGRAPHER_CSX_WG_400_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex. Graphs in this category: LAW web graphs (https://law.di.unimi.it/datasets.php) .
PARAGRAPHER_CSX_WG_404_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex and 4 Bytes integer weights per edge. Graphs in this category: MS-BioGraphs (https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs/).
PARAGRAPHER_CSX_WG_800_AP: graphs compressed in Big WebGraph format with 8 Bytes ID per vertex. Graphs in this category: (i) WDC Hyper Link 2012 (https://webdatacommons.org/hyperlinkgraph/) and (ii) SWH graphs (https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html)

ParaGrapher uses its asynchronous and parallel API to implement these graph types. The user needs to implement a callback function that is called by the API upon completion of reading a block of edges. Poplar uses a shared memory for interaction between its C library and the Java library that deploys the WebGraph framework.

For further details, please refer to Poplar source code repository: https://github.com/DIPSA-QUB/ParaGrapher, particularly, src/webgraph.c and src/WG*.java files.

ParaGrapher

Related Posts

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

MS-BioGraphs Validation

Posted on 10 August 2023 by Mohsen Koohi Esfahani

Repository

https://github.com/DIPSA-QUB/MS-BioGraphs-Validation

Explanation

We provide a Shell script, validation.sh, and a Java program, EdgeBlockSHA.java, to verify the the correctness of the graphs. Each graph has a .ojson file whose shasum is verified by the value retreived from our server. Files such as offsets.bin, wcc.bin, n2o.bin, trans_offsets.bin, and edges_shas.txt have shasum records in the ojson file which is used for validation of these files.

The graph in WebGraph format has been compressed in MS??-underlying.* and MS??-weights.* files. In order to validate the compressed graph, the EdgeBlockSHA.java is used. It is a parallel Java code that uses the WebGraph library to traverse the graph and calculate the shasum of blocks of edges (endpoints and weights). Then, the calculated results are matched with the edges_shas.txt file of the graph.

It is also possible to validate some particular blocks by matching the calculated shasum with the relevant row in the edges_shas.txt file. This file has a format such as the following. Each block contains 64 Million consecutive edges. The start of each block is identified by a vertex ID and its edge index. The Column endpoint_sha is the shasum of the 64 Million endpoints when stored as an array of 4-Bytes elements in the binary format and in the little endian order. Similarly, Column weights_sha shows the shasum of weights (labels). We have separated weights from endpoints as in some applications weights are not needed and therefore it is not necessary to read and validate them.

64MB blk#;     vertex; edge index;                             endpoint_sha;                              weights_sha;
         0;          0;          0; 509784b158cb9404241afb21d0ceaf590b88d2f2; 57da4ad7bb89c5922e436b0535d791fa8f40dffd;
         1;    2315113;        705; fafc118563c1d7b5fbff64af56edd6a56524f479; 13b7a9ca60bfb0715d563218d0a1cd787b00a07c;
         2;    4521625;        597; 4ed65aa07c8062a151166ef2e9bdb93e41d19357; 8158276bec426ee46eca9912759eb9bd57fcc957;
         3;    6347361;        112; d02e8913c807c3f4ecde9c638e0ded5ab80ba819; 26bc3296de65cba6ac539cd96b79ae6f7a4d37be;
         4;    8447869;         15; 61513c84db40124496cdf769516118b63598914f; 781b9f4372ac614e94d097017c756d015234deb6;

Requirements

JDK with version > 15
jq
wget

WebGraph Framework

Please visit https://webgraph.di.unimi.it .

ParaGrapher Graph Loading API and Library

The WebGraph formats can also be read using the ParaGrapher library: https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

License

Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an “as is” basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied.

MS-BioGraphs

Related Posts

MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

LaganLighter Source Code

Posted on 14 November 2022 by Mohsen Koohi Esfahani

Repository

https://github.com/DIPSA-QUB/LaganLighter

Algorithms in This Repo

SAPCo Sort: alg1_sapco_sort
Thrifty Label Propagation Connected Components: alg2_thrifty
MASTIFF: Structure-Aware Mimum Spanning Tree/Forest: alg3_mastiff
iHTL: in-Hub Temporal Locality in SpMV (Sparse-Matrix Vector Multiplication) based Graph Processing: (to be added)
LOTUS: Locality Optimizing Trinagle Counting: (to be added)

Cloning

git clone https://github.com/MohsenKoohi/LaganLighter.git --recursive

Graph Types

LaganLighter supports the following graph formats:

CSR/CSC graph in text format, for testing. This format has 4 lines: (i) number of vertices (|V|), (ii) number of edges (|E|), (iii) |V| space-separated numbers showing offsets of the vertices, and (iv) |E| space-separated numbers indicating edges.
CSR/CSC WebGraph format: supported by the Poplar Graph Loading Library
external git repository

Measurements

In addition to execution time, we use the PAPI library to measure hardware counters such as L3 cache misses, hardware instructions, DTLB misses, and load and store memory instructions. ( papi_(init/start/reset/stop) and (print/reset)_hw_events functions defined in omp.c).

To measure load balance, we measure the total time of executing a loop and the time each thread spends in this loop (mt and ttimes in the following sample code). Using these values, PTIP macro (defined in omp.c) calculates the percentage of average idle time (as an indicator of load imbalance) and prints it with the total time (mt).

mt = - get_nano_time()
#pragma omp parallel  
{
   unsigned tid = omp_get_thread_num();
   ttimes[tid] = - get_nano_time();
	
   #pragma omp for nowait
   for(unsigned int v = 0; v < g->vertices_count; v++)
   {
      // .....
   }
   ttimes[tid] += get_nano_time();
}
mt += get_nano_time();
PTIP("Step ... ");

As an example, the following execution of Thrifty, shows that the “Zero Planting” step has been performed in 8.98 milliseconds and with a 8.22% load imbalance, while processors have been idle for 72.22% of the execution time, on average, in the “Initial Push” step.

NUMA-Aware and Locality-Preserving Partitioning and Scheduling

In order to assign consecutive partitions (vertices and/or their edges) to each parallel processor, we initially divide partitions and assign a number of consecutive partitions to each thread. Then, we specify the order of victim threads in the work-stealing process. During the initialization of LaganLighter parallel processing environment (in initialize_omp_par_env() function defined in file omp.c), for each thread, we create a list of threads as consequent victims of stealing.

A thread, first, steals jobs (i.e., partitions) from consequent threads in the same NUMA node and then from the threads in consequent NUMA nodes. As an example, the following image shows the stealing order of a 24-core machine with 2 NUMA nodes. This shows that thread 1 steals from threads 2, 3, …,11, and ,0 running on the same NUMA socket and then from threads 13, 14, …, 23, and 12 running on the next NUMA socket.

We use dynamic_partitioning_...() functions (in file partitioning.c) to process partitions by threads in the specified order. A sample code is in the following:

struct dynamic_partitioning* dp = dynamic_partitioning_initialize(pe, partitions_count);

#pragma omp parallel  
{
   unsigned int tid = omp_get_thread_num();
   unsigned int partition = -1U;		

   while(1)
   {
      partition = dynamic_partitioning_get_next_partition(dp, tid, partition);
      if(partition == -1U)
	 break; 

      for(v = start_vertex[partition]; v < start_vertex[partition + 1]; v++)
      {
	// ....
       }
   }
}

dynamic_partitioning_reset(dp);

Bugs & Support

As “we write bugs that in particular cases have been tested to work correctly”, we try to evaluate and validate the algorithms and their implementations. If you receive wrong results or you are suspicious about parts of the code, please contact us or submit an issue.

License

LaganLighter

ParaGrapher Integrated to LaganLighter16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

Graptor Sources Published

Posted on 30 October 2021 by Hans Vandierendonck

Finally got around to this: publishing the Graptor source code. With time passing, the code has changed quite a bit compared to that used in the paper: Graptor: efficient pull and push style vectorized graph processing. The evolution of the code has advantages: it’s faster. There are also disadvantages: not all versions and variations of the code that were experimented with can still be compiled.

The source code can be found here: https://github.com/hvdieren/graptor

There will likely be issues (errors, lack of documentation, …) as this is experimental research code. Drop me a line if you need a hand h {a dot} vandierendonck {an at} qub {another dot} ac {the last dot} uk .

GraphGrind Source Code

Posted on 12 August 2017 by Hans Vandierendonck

The following git repository contains source code of GraphGrind

https://github.com/DIPSA-QUB/GraphGrind

DIPSA: Data-Intensive Parallel Systems and Algorithms

Tag Archives: source-code

QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique – Euro-Par 2024

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)

C vs. Java

MS-BioGraphs on IEEE DataPort

ParaGrapher Integrated to LaganLighter

ParaGrapher Source Code For WebGraph Types

MS-BioGraphs Validation

Repository

Explanation

Requirements

WebGraph Framework

License

Copyright 2022-2023 The Queen’s University of Belfast, Northern Ireland, UK

LaganLighter Source Code

Repository

Algorithms in This Repo

Cloning

Graph Types

Measurements

NUMA-Aware and Locality-Preserving Partitioning and Scheduling

Bugs & Support

License

Related Posts

Graptor Sources Published

GraphGrind Source Code