Poster Presentation | Graphlet-based Filtering for High-Performance Subgraph Isomorphism Search

I presented a poster on “Graphlet-based Filtering for High-Performance Subgraph Isomorphism Search” at the 19th International Summer School on Advanced Computer Architecture and Compilation for High-performance Embedded Systems organised by HiPEAC at Fiuggi, Italy between July 9 – 15, 2023.


Summary

Introduction

Graphs are powerful tools for representing complex relationships in various domains such as social networks, bioinformatics, and computer networks. The Subgraph Isomorphism (SI) search problem is crucial in graph analytics, involving finding instances of a pattern graph within a larger data graph. This problem has applications in pattern discovery and graph database queries.

However, the SI problem is NP-complete, meaning that it becomes computationally intensive for larger graphs. To tackle this challenge, researchers have proposed heuristic algorithms that aim to speed up the SI search by pruning the search space through various techniques.

Three Stages of Heuristic Algorithms

These heuristic algorithms can be divided into three stages:

  1. Filtering: In this stage, a candidate set of data graph vertices is generated for each vertex of the pattern graph. The goal is to select a subset of data graph vertices that are potential matches for each pattern vertex. Two common filtering techniques are Label and Degree Filtering (LDF) and Neighbourhood Label Filtering (NLF). LDF ensures that only vertices with matching labels and sufficient degrees are considered, while NLF adds further constraints based on the labels of neighbours.
  2. Ordering: The vertices of the pattern graph are ordered for mapping with the candidate data vertices. The Highest Degree First (HDF) strategy is commonly used.
  3. Backtracking: A backtracking search is performed based on the ordered vertices to find subgraph isomorphisms.

Proposed Approach: Graphlet Filtering (GLF)

The proposed approach, Graphlet Filtering (GLF), introduces an additional stage to the heuristic algorithms. Graphlets are recurring small subgraphs, and orbits represent groups of vertices within these graphlets. By counting the occurrences of orbits in both pattern and data graphs, we propose that the count of an orbit in the pattern graph must be less than or equal to the count of the same orbit in the candidate data vertices’ graphs. This idea enhances the filtering stage by providing more stringent filtering criteria.

Preliminary Results

We conducted preliminary experiments on various datasets and found that GLF filtering provides additional filtering enhancements ranging from 4.2% to 15.38% depending on the dataset and pattern graph density. Across all datasets, the GLF technique improved filtering by 9.93% for sparse pattern graphs and 8.49% for dense pattern graphs.

Conclusion

The study suggests that incorporating graphlet-based filtering (GLF) into the existing heuristic algorithms for the Subgraph Isomorphism problem can lead to more effective filtering and pruning of the search space. We plan to explore the impact of GLF on the runtime of different heuristic algorithms. If successful, this technique could significantly reduce the execution time of algorithms used for tasks such as pattern discovery and graph database queries.


Download the poster and the abstract below.