DIPSA at IPDPS’25

Posted on 23 January 2025 by Hans Vandierendonck

Two of our papers were accepted at IPDPS’25.

Brian will present his work on improving the scalability of parallel molecular dynamics simulation. He has developed a novel way to reduce the scalability bottleneck that exists in the communication between those processes computing short-range forces vs those computing long-range forces. His technique discards data dependences when long-range processes are “too slow” and uses interpolation of the (slowly-varying) long-range forces to progress the computation. Stay tuned for the camera-ready copy of the paper! This work was supported by the EPSRC New Horizons project ASCCED (EP/X01794X/1).

Hans will present a parallel algorithm for the maximum clique problem. The key ideas relate to reducing the amount of work where possible, which includes delaying or avoiding the construction of fast representations of neighbour lists, early-exiting set intersection operations and algorithmic choice between maximum clique search and the complementary minimum vertex cover problem.

Addtionally, Marco will attend IPDPS’25 by virtue of a travel grant from the TCHPC/TCPP HPC student cohort programme.

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

Posted on 1 May 2024 by Mohsen Koohi Esfahani

PDF version
DOI: 10.48550/arXiv.2404.19735

Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks.

To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types.

Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats.

ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

BibTex

@misc{paragrapher-arxiv,
  title = { Selective Parallel Loading of Large-Scale 
            Compressed Graphs with ParaGrapher}, 
  author = { {Mohsen} {Koohi Esfahani} and Marco D'Antonio and 
             Syed Ibtisam Tauhidi and Thai Son Mai and 
             Hans Vandierendonck},
  year = {2024},
  eprint = {2404.19735},
  archivePrefix = {arXiv},
  primaryClass = {cs.AR},
  doi = {10.48550/arXiv.2404.19735}
}

Related Posts & Source Code

ParaGrapher Web Page

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

ParaGrapher Integrated to LaganLighter

Posted on 16 February 2024 by Mohsen Koohi Esfahani

Poplar source code has been integrated to LaganLighter and access to different WebGraph formats are available in LaganLighter:

PARAGRAPHER_CSX_WG_400_AP
PARAGRAPHER_CSX_WG_404_AP
PARAGRAPHER_CSX_WG_800_AP

For further details, please refer to
– LaganLighter source coder Repository: https://github.com/DIPSA-QUB/LaganLighter, particularly, the graph.c file.
– ParaGrapher source code repository: https://github.com/DIPSA-QUB/ParaGrapher particularly, the src/webgraph.c and src/WG*.java files.

Read more about ParaGrapher and LaganLighter.

Related Posts

On Optimizing Locality of Graph Transposition on Modern Architectures15 January 2025
Random Vertex Relabelling in LaganLighter21 August 2024
Minimum Spanning Forest of MS-BioGraphs9 August 2024
Topology-Based Thread Affinity Setting (Thread Pinning) in OpenMP3 August 2024
Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

ParaGrapher Source Code For WebGraph Types

Posted on 16 February 2024 by Mohsen Koohi Esfahani

ParaGrapher source code for accessing WebGraphs have been published. The supported graph types are:

PARAGRAPHER_CSX_WG_400_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex. Graphs in this category: LAW web graphs (https://law.di.unimi.it/datasets.php) .
PARAGRAPHER_CSX_WG_404_AP: graphs compressed in WebGraph format with 4 Bytes ID per vertex and 4 Bytes integer weights per edge. Graphs in this category: MS-BioGraphs (https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs/).
PARAGRAPHER_CSX_WG_800_AP: graphs compressed in Big WebGraph format with 8 Bytes ID per vertex. Graphs in this category: (i) WDC Hyper Link 2012 (https://webdatacommons.org/hyperlinkgraph/) and (ii) SWH graphs (https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html)

ParaGrapher uses its asynchronous and parallel API to implement these graph types. The user needs to implement a callback function that is called by the API upon completion of reading a block of edges. Poplar uses a shared memory for interaction between its C library and the Java library that deploys the WebGraph framework.

For further details, please refer to Poplar source code repository: https://github.com/DIPSA-QUB/ParaGrapher, particularly, src/webgraph.c and src/WG*.java files.

ParaGrapher

Related Posts

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version1 May 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
ParaGrapher Integrated to LaganLighter16 February 2024
ParaGrapher Source Code For WebGraph Types16 February 2024

On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)

Posted on 15 December 2023 by Mohsen Koohi Esfahani

2023 IEEE International Conference on Big Data (BigData’23)
December 15-18, 2023, Sorrento, Italia

DOI: 10.1109/BigData59044.2023.10386309
PDF (Authors Copy)

Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.

To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creating MS-BioGraphs, a new family of publicly available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently. The largest graph is created by matching (i.e., all-to-all similarity aligning) 1.7 billion protein sequences. The MS-BioGraphs family includes also seven subgraphs with different sizes and direction types.

We describe two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique.

The datasets are available online on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs.

BibTex

@INPROCEEDINGS{10.1109/BigData59044.2023.10386309,
   author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick,  Peter and  Vigna, Sebastiano},  
  booktitle={2023 IEEE International Conference on Big Data (BigData'23)},  
  title={On Overcoming {HPC} Challenges of  Trillion-Scale Real-World Graph Datasets}, 
  year={2023},
  volume={},
  number={},
  pages={},
  location={Italia, Sorrento},
  publisher={IEEE Computer Society},
  doi={10.1109/BigData59044.2023.10386309}
}

MS-BioGraphs

Related Posts

Minimum Spanning Forest of MS-BioGraphs9 August 2024
MS-BioGraphs on IEEE DataPort17 April 2024
ParaGrapher Source Code For WebGraph Types16 February 2024
On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)15 December 2023
Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs – IISWC’23 (Poster)2 October 2023
MS-BioGraphs: Sequence Similarity Graph Datasets30 August 2023
MS-BioGraphs MS10 August 2023
MS-BioGraphs MSA50010 August 2023
MS-BioGraphs MS20010 August 2023
MS-BioGraphs MSA20010 August 2023
MS-BioGraphs MS5010 August 2023
MS-BioGraphs MSA5010 August 2023
MS-BioGraphs MSA1010 August 2023
MS-BioGraphs MS110 August 2023
MS-BioGraphs Validation10 August 2023

On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)

Posted on 8 December 2022 by Mohsen Koohi Esfahani

Mohsen Koohi Esfahani
Supervisors: Hans Vandierendonck and Peter Kilpatrick

Thesis in PDF format
Thesis on QUB Pure Portal

Graph algorithms find several usages in industry, science, humanities, and technology. The fast-growing size of graph datasets in the context of the processing model of the current hardware has resulted in different bottlenecks such as memory locality, work-efficiency, and load-balance that degrade the performance. To tackle these limitations, high-performance computing considers different aspects of the execution in order to design optimized algorithms through efficient usage of hardware resources.

The main idea in this thesis is to analyze the structure of graphs to exploit special features that are key to introduce new graph algorithms with optimized performance.

First, we study the structure of real-world graph datasets with skewed degree distribution and the applicability of graph relabeling algorithms as the main restructuring tools to improve performance and memory locality. To that end, we introduce novel locality metrics including Cache Miss Rate Degree Distribution, Effective Cache Size, Push Locality and Pull Locality, and Degree Range Decomposition.

Based on this structural analysis, we introduce the Uniform Memory Demands strategy that (i) recognizes diverse memory demands and behaviours as a source of performance inefficiency, (ii) separates contrasting memory demands into groups with uniform behaviours across each group, and (iii) designs bespoke data structures and algorithms for each group in order to satisfy memory demands with the lowest overhead.

We apply the Uniform Memory Demands strategy to design three graph algorithms with optimized performance: (i) the SAPCo Sort algorithm as a parallel counting sort algorithm that is faster than comparison-based sorting algorithms in degree-ordering of power-law graphs, (ii) the iHTL algorithm that optimizes locality in Sparse Matrix-Vector (SpMV) Multiplication graph algorithms by extracting dense subgraphs containing incoming edges to in-hubs and processing them in the push direction, and (iii) the LOTUS algorithm that optimizes locality in Triangle Counting by separating different caching demands and deploying specific data structure and algorithm for each of them.

Bibtex

@phdthesis{ODSAGA-ethos.874822,
  title  = {On Designing Structure-Aware High-Performance Graph Algorithms},
  author = {Mohsen Koohi Esfahani},
  year   = 2022,
  url    = {https://blogs.qub.ac.uk/DIPSA/On-Designing-Structure-Aware-High-Performance-Graph-Algorithms-PhD-Thesis/},
  school = {Queen's University Belfast},
  EThOSID = {uk.bl.ethos.874822}
}

On Optimizing Locality of Graph Transposition on Modern Architectures15 January 2025
Random Vertex Relabelling in LaganLighter21 August 2024
Minimum Spanning Forest of MS-BioGraphs9 August 2024
Topology-Based Thread Affinity Setting (Thread Pinning) in OpenMP3 August 2024
ParaGrapher Integrated to LaganLighter16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

LaganLighter

LaganLighter Source Code

Posted on 14 November 2022 by Mohsen Koohi Esfahani

Repository

https://github.com/DIPSA-QUB/LaganLighter

Documentation

https://github.com/DIPS-QUB/LaganLighter/tree/main/docs

Algorithms in This Repo

SAPCo Sort: alg1_sapco_sort
Thrifty Label Propagation Connected Components: alg2_thrifty
MASTIFF: Structure-Aware Mimum Spanning Tree/Forest: alg3_mastiff
iHTL: in-Hub Temporal Locality in SpMV (Sparse-Matrix Vector Multiplication) based Graph Processing: (to be added)
LOTUS: Locality Optimizing Trinagle Counting: (to be added)

Cloning

git clone https://github.com/MohsenKoohi/LaganLighter.git --recursive

Graph Types

LaganLighter supports the following graph formats:

CSR/CSC graph in text format, for testing. This format has 4 lines: (i) number of vertices (|V|), (ii) number of edges (|E|), (iii) |V| space-separated numbers showing offsets of the vertices, and (iv) |E| space-separated numbers indicating edges.
CSR/CSC WebGraph format: supported by the Poplar Graph Loading Library
external git repository

Measurements

In addition to execution time, we use the PAPI library to measure hardware counters such as L3 cache misses, hardware instructions, DTLB misses, and load and store memory instructions. ( papi_(init/start/reset/stop) and (print/reset)_hw_events functions defined in omp.c).

To measure load balance, we measure the total time of executing a loop and the time each thread spends in this loop (mt and ttimes in the following sample code). Using these values, PTIP macro (defined in omp.c) calculates the percentage of average idle time (as an indicator of load imbalance) and prints it with the total time (mt).

mt = - get_nano_time()
#pragma omp parallel  
{
   unsigned tid = omp_get_thread_num();
   ttimes[tid] = - get_nano_time();
	
   #pragma omp for nowait
   for(unsigned int v = 0; v < g->vertices_count; v++)
   {
      // .....
   }
   ttimes[tid] += get_nano_time();
}
mt += get_nano_time();
PTIP("Step ... ");

As an example, the following execution of Thrifty, shows that the “Zero Planting” step has been performed in 8.98 milliseconds and with a 8.22% load imbalance, while processors have been idle for 72.22% of the execution time, on average, in the “Initial Push” step.

NUMA-Aware and Locality-Preserving Partitioning and Scheduling

In order to assign consecutive partitions (vertices and/or their edges) to each parallel processor, we initially divide partitions and assign a number of consecutive partitions to each thread. Then, we specify the order of victim threads in the work-stealing process. During the initialization of LaganLighter parallel processing environment (in initialize_omp_par_env() function defined in file omp.c), for each thread, we create a list of threads as consequent victims of stealing.

A thread, first, steals jobs (i.e., partitions) from consequent threads in the same NUMA node and then from the threads in consequent NUMA nodes. As an example, the following image shows the stealing order of a 24-core machine with 2 NUMA nodes. This shows that thread 1 steals from threads 2, 3, …,11, and ,0 running on the same NUMA socket and then from threads 13, 14, …, 23, and 12 running on the next NUMA socket.

We use dynamic_partitioning_...() functions (in file partitioning.c) to process partitions by threads in the specified order. A sample code is in the following:

struct dynamic_partitioning* dp = dynamic_partitioning_initialize(pe, partitions_count);

#pragma omp parallel  
{
   unsigned int tid = omp_get_thread_num();
   unsigned int partition = -1U;		

   while(1)
   {
      partition = dynamic_partitioning_get_next_partition(dp, tid, partition);
      if(partition == -1U)
	 break; 

      for(v = start_vertex[partition]; v < start_vertex[partition + 1]; v++)
      {
	// ....
       }
   }
}

dynamic_partitioning_reset(dp);

Bugs & Support

As “we write bugs that in particular cases have been tested to work correctly”, we try to evaluate and validate the algorithms and their implementations. If you receive wrong results or you are suspicious about parts of the code, please contact us or submit an issue.

License

Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an “as is” basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied. For details see terms of the License.

Copyright 2022 The Queen’s University of Belfast, Northern Ireland, UK

LaganLighter

On Optimizing Locality of Graph Transposition on Modern Architectures15 January 2025
Random Vertex Relabelling in LaganLighter21 August 2024
Minimum Spanning Forest of MS-BioGraphs9 August 2024
Topology-Based Thread Affinity Setting (Thread Pinning) in OpenMP3 August 2024
ParaGrapher Integrated to LaganLighter16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

Technical Posts

Random Vertex Relabelling in LaganLighter21 August 2024
Minimum Spanning Forest of MS-BioGraphs9 August 2024
Topology-Based Thread Affinity Setting (Thread Pinning) in OpenMP3 August 2024
An (Incomplete) List of Publicly Available Graph Datasets/Generators21 June 2024
An Evaluation of Bandwidth of Different Storage Types (HDD vs. SSD vs. LustreFS) for Different Block Sizes and Different Parallel Read Methods (mmap vs pread vs read)20 April 2024
SIMD Bit Twiddling Hacks25 November 2023
LaganLighter Source Code14 November 2022

Approximate Maximum Weighted Clique

Posted on 25 August 2022 by Qasim Abbas

This project aims to develop novel algorithms for the maximum weighted clique (MWC) problem, which appears in various data analysis pipelines in precision medicine. The MWC problem is NP-hard in nature, which makes it particularly challenging given the exponentially increasing amount of data it is applied to.

Although several attempts have been made to solve the maximum weighted clique problem in large graphs, there is still much opportunity for lowering the execution time necessary to find a satisfactory solution. In this project in particular we are investigating approximate algorithms for the MWC problem. We are working towards an algorithm that achieves a very high quality solution (i.e., finding a clique with weight very close to the MWC) in polynomial time.

IBM will provide industrially relevant context on knowledge extraction from graph-structured data. They have extensive experience in this area by building scalable software systems for the analysis of massive-scale graph data. They will moreover provide access to relevant datasets.

Project Members

Publications

QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique, EURO-PAR 2024, link.

Funding

This PhD project is funded by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Actions.

CITI-GENS MSCA CO-FUND

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’22

Posted on 28 June 2022 by Mohsen Koohi Esfahani

36th ACM International Conference on Supercomputing 2022
June 27-30, 2022
Acceptance Rate: 25%

DOI: 10.1145/3524059.3532365
Authors’ Copy (PDF Format)

The Minimum Spanning Forest (MSF) problem finds usage in many different applications. While theoretical analysis shows that linear-time solutions exist, in practice, parallel MSF algorithms remain computationally demanding due to the continuously increasing size of data sets.

In this paper, we study the MSF algorithm from the perspective of graph structure and investigate the implications of the power-law degree distribution of real-world graphs
on this algorithm.

We introduce the MASTIFF algorithm as a structure-aware MSF algorithm that optimizes work efficiency by (1) dynamically tracking the largest forest component of each graph component and exempting them from processing, and (2) by avoiding topology-related operations such as relabeling and merging neighbour lists.

The evaluations on 2 different processor architectures with up to 128 cores and on graphs of up to 124 billion edges, shows that Mastiff is 3.4–5.9× faster than previous works.

Code Availability
The source-code of MASTIFF is available on LaganLighter Repository (alg3_mastiff.c and msf.c files). A sample execution of this source code for “Twitter-MPI” graph is shown in the following:

BibTex

@INPROCEEDINGS{10.1145/3524059.3532365,
author = {Koohi Esfahani, Mohsen and Kilpatrick, Peter and Vandierendonck, Hans},
title = {{MASTIFF}: Structure-Aware Minimum Spanning Tree/Forest},
year = {2022},
isbn = {},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3524059.3532365},
doi = {10.1145/3524059.3532365},
booktitle = {Proceedings of the 36th ACM International Conference on Supercomputing},
numpages = {13}
}

LaganLighter

Related Posts

On Optimizing Locality of Graph Transposition on Modern Architectures15 January 2025
Random Vertex Relabelling in LaganLighter21 August 2024
Minimum Spanning Forest of MS-BioGraphs9 August 2024
Topology-Based Thread Affinity Setting (Thread Pinning) in OpenMP3 August 2024
ParaGrapher Integrated to LaganLighter16 February 2024
On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)8 December 2022
LaganLighter Source Code14 November 2022
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’2228 June 2022
SAPCo Sort: Optimizing Degree-Ordering for Power-Law Graphs – ISPASS’22 (Poster)23 May 2022
LOTUS: Locality Optimizing Triangle Counting – PPOPP’225 April 2022
Locality Analysis of Graph Reordering Algorithms – IISWC’218 November 2021
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs – IEEE CLUSTER’219 September 2021
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing – ICPP’219 August 2021
How Do Graph Relabeling Algorithms Improve Memory Locality? ISPASS’21 (Poster)28 March 2021

Software-Defined Floating-Point Number Formats and Their Application to Graph Processing – ICS’22

Posted on 15 June 2022 by Hans Vandierendonck

DOI: 10.1145/3524059.3532360

This paper proposes software-defined floating-point number formats for graph processing workloads, which can improve performance in irregular workloads by reducing cache misses. Efficient arithmetic on software-defined number formats is challenging, even when based on conversion to wider, hardware-supported formats. We derive efficient conversion schemes that are tuned to the IA64 and AVX512 instruction sets.

We demonstrate that: (i) reduced-precision number formats can be applied to graph processing without loss of accuracy; (ii) conversion of floating-point values is possible
with minimal instructions; (iii) conversions are most efficient when utilizing vectorized instruction sets, specifically on IA64 processors.

Experiments on twelve real-world graph data sets demonstrate that our techniques result in speedups up to 89% for PageRank and Accelerated PageRank, and up to 35% for Single-Source Shortest Paths. The same techniques help to accelerate the integer-based maximal independent set problem by up to 262%.

DIPSA: Data-Intensive Parallel Systems and Algorithms

Tag Archives: algorithm design and engineering

DIPSA at IPDPS’25

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher – arXiv Version

ParaGrapher Integrated to LaganLighter

ParaGrapher Source Code For WebGraph Types

On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets – BigData’23 (Short Paper)

On Designing Structure-Aware High-Performance Graph Algorithms (PhD Thesis)

Related Posts

LaganLighter Source Code

Repository

Algorithms in This Repo

Cloning

Graph Types

Measurements

NUMA-Aware and Locality-Preserving Partitioning and Scheduling

Bugs & Support

License

Related Posts

Approximate Maximum Weighted Clique

Publications

Funding

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest – ICS’22

Software-Defined Floating-Point Number Formats and Their Application to Graph Processing – ICS’22