Poster Presentation | Graphlet-based Filtering for High-Performance Subgraph Isomorphism Search

I presented a poster on “Graphlet-based Filtering for High-Performance Subgraph Isomorphism Search” at the 19th International Summer School on Advanced Computer Architecture and Compilation for High-performance Embedded Systems organised by HiPEAC at Fiuggi, Italy between July 9 – 15, 2023.


Summary

Introduction

Graphs are powerful tools for representing complex relationships in various domains such as social networks, bioinformatics, and computer networks. The Subgraph Isomorphism (SI) search problem is crucial in graph analytics, involving finding instances of a pattern graph within a larger data graph. This problem has applications in pattern discovery and graph database queries.

However, the SI problem is NP-complete, meaning that it becomes computationally intensive for larger graphs. To tackle this challenge, researchers have proposed heuristic algorithms that aim to speed up the SI search by pruning the search space through various techniques.

Three Stages of Heuristic Algorithms

These heuristic algorithms can be divided into three stages:

  1. Filtering: In this stage, a candidate set of data graph vertices is generated for each vertex of the pattern graph. The goal is to select a subset of data graph vertices that are potential matches for each pattern vertex. Two common filtering techniques are Label and Degree Filtering (LDF) and Neighbourhood Label Filtering (NLF). LDF ensures that only vertices with matching labels and sufficient degrees are considered, while NLF adds further constraints based on the labels of neighbours.
  2. Ordering: The vertices of the pattern graph are ordered for mapping with the candidate data vertices. The Highest Degree First (HDF) strategy is commonly used.
  3. Backtracking: A backtracking search is performed based on the ordered vertices to find subgraph isomorphisms.

Proposed Approach: Graphlet Filtering (GLF)

The proposed approach, Graphlet Filtering (GLF), introduces an additional stage to the heuristic algorithms. Graphlets are recurring small subgraphs, and orbits represent groups of vertices within these graphlets. By counting the occurrences of orbits in both pattern and data graphs, we propose that the count of an orbit in the pattern graph must be less than or equal to the count of the same orbit in the candidate data vertices’ graphs. This idea enhances the filtering stage by providing more stringent filtering criteria.

Preliminary Results

We conducted preliminary experiments on various datasets and found that GLF filtering provides additional filtering enhancements ranging from 4.2% to 15.38% depending on the dataset and pattern graph density. Across all datasets, the GLF technique improved filtering by 9.93% for sparse pattern graphs and 8.49% for dense pattern graphs.

Conclusion

The study suggests that incorporating graphlet-based filtering (GLF) into the existing heuristic algorithms for the Subgraph Isomorphism problem can lead to more effective filtering and pruning of the search space. We plan to explore the impact of GLF on the runtime of different heuristic algorithms. If successful, this technique could significantly reduce the execution time of algorithms used for tasks such as pattern discovery and graph database queries.


Download the poster and the abstract below.

Enhancing Biomedical Research and Road Planning with Machine Learning-Powered Heuristics

This submission won the first prize in the ‘Faculty of Engineering and Physical Sciences’ category at the ‘Research Culture Poster Competition’ organised by the Graduate School, Queen’s University Belfast in 2021.


Two seemingly unrelated fields – biomedical research and road planning – have found common ground through subgraph isomorphism search. This unlikely connection has given rise to a fascinating application: the use of machine learning-powered heuristics to optimize the search for specific patterns within complex data graphs.

Unveiling the Challenge

Imagine a biomedical researcher searching to identify the presence of a specific amino acid within a particular protein molecule. Or, picture a town planner identifying hazardous road interconnections within an extensive road network. Despite the apparent differences, both challenges share a fundamental characteristic – the data, in each case, can be represented as graphs in computer science and requires graph pattern finding.

When represented as graphs, these problems boil down to what’s known as the ‘subgraph isomorphism problem’. In essence, it’s the challenge of finding a small pattern graph within a much larger data graph. Unfortunately, this problem falls into the NP-complete category, indicating that as the pattern and data graphs grow, the search process can become a daunting task, taking weeks, months, or even years to complete.

The Quest for Efficient Solutions

Historically, researchers have grappled with this computational bottleneck, but the absence of a universal algorithm capable of tackling the subgraph isomorphism problem left them with limited options. However, they have proposed heuristics that try to reduce the runtime by pruning the search space.

Heuristics are shortcuts or rules of thumb that aim to find approximate solutions to complex problems. While they provide a way to navigate the subgraph isomorphism problem more efficiently, it’s essential to understand that there’s no one-size-fits-all solution. Different pattern graphs demand different heuristics to achieve optimal performance.

Bridging the Gap with Machine Learning

This brings us to our research question: Can we leverage machine learning to predict the best heuristic for a given pattern graph?

Machine learning techniques allows us to train models that can analyse the properties of a pattern graph and identify which heuristic is most likely to yield the best performance.

Understanding the Machine Learning Magic

The real wonder behind this innovation lies in understanding how machine learning models make these decisions. The details can be complex, but at a high level, they rely on a wealth of training data. This data includes a diverse range of pattern graphs and corresponding heuristic-performance data.

By using machine learning-powered heuristics to solve the subgraph isomorphism problem, we can accelerate the search process.

As we look to the future, it’s exciting to think about the further applications of machine learning and heuristics, as they continue to transform the way we approach complex problems and find solutions that were once thought to be out of reach.

High-Performance Distributed Graph Analytics | Sub-graph Isomorphism | Initial Review

Alpha draft of the PowerPoint presentation

In this post, I’ll be introducing the core aspects of my research, discussing the significance of subgraph isomorphism search in graphs, and outlining the objectives and milestones that lay ahead for me in my PhD journey.

As you may know, graphs are a fundamental structure in computer science, finding applications in diverse real-life scenarios like computer networks, social networks, bioinformatics, astrophysics, chemistry and road systems. Graph analytics, in turn, focuses on deciphering the complexities of these interconnected structures to extract valuable insights. Graph analytics is pivotal in addressing various challenges and opportunities that arise in graph datasets across industries. Applications of graph analytics include classification, clustering and pattern recognition. In my PhD, I will focus on one particular challenge, i.e., the pattern recognition problem, also called the subgraph isomorphism search problem.

This subgraph isomorphism search problem involves identifying instances of a smaller graph within a larger graph – finding a needle in a haystack. While this problem holds immense potential for diverse applications, its complexity arises out of it being NP-complete, making it incredibly challenging to find efficient (i.e., polynomial-time) solutions. To address the challenge, exact heuristic solutions and inexact approximate solutions have been proposed in the literature. In my research, I am focusing on exact heuristic solutions for the subgraph isomorphism problem.


There are three objectives in my PhD research.

Objective 1: Dynamic Heuristic Selection

One of my core objectives revolves around the dynamic selection of heuristics based on the characteristics of the pattern and data pair. I aim to develop an intelligent system that can determine the most suitable heuristic before the initiation of the search process itself. This adaptive approach could significantly enhance the efficiency of the search process and potentially outperform existing solutions.

Objective 2: Vector-Based Embedding

To enhance efficiency, I aim to propose a novel heuristic that relies on vector-based embeddings of graph nodes. I hypothesise that this departure from conventional scalar-based heuristics would enable the encapsulation of richer information (i.e., context) within the embeddings. By leveraging this additional context, I hope to enable accurate search space reduction (filtering) and early-stage matches (ordering), thereby reducing the need for backtracking and accelerating the matching process.

Objective 3: Parallelism and Optimization

My final objective focuses on the application of parallelism to both the previous objectives. While parallelism holds immense potential for speedup, it also introduces challenges such as load balancing and synchronization. My aim is to focus on shared memory parallelism while addressing concurrent potential pitfalls and, if necessary, exploring the realm of distributed memory systems.


The presentation outlines a comprehensive work plan for my research journey, marked by milestones and deadlines. My initial focus lies on completing experimentation and crafting a conference paper that showcases the outcomes of my pursuit of objectives.

My PhD project aims to enhance the efficiency and effectiveness of graph analytics, specifically in the context of subgraph isomorphism search. By adopting heuristics dynamically, embracing vector-based embeddings, and harnessing parallelism, I aim to contribute to the understanding of subgraph isomorphism search.