High-Performance Distributed Graph Analytics | Sub-graph Isomorphism | Initial Review

Alpha draft of the PowerPoint presentation

In this post, I’ll be introducing the core aspects of my research, discussing the significance of subgraph isomorphism search in graphs, and outlining the objectives and milestones that lay ahead for me in my PhD journey.

As you may know, graphs are a fundamental structure in computer science, finding applications in diverse real-life scenarios like computer networks, social networks, bioinformatics, astrophysics, chemistry and road systems. Graph analytics, in turn, focuses on deciphering the complexities of these interconnected structures to extract valuable insights. Graph analytics is pivotal in addressing various challenges and opportunities that arise in graph datasets across industries. Applications of graph analytics include classification, clustering and pattern recognition. In my PhD, I will focus on one particular challenge, i.e., the pattern recognition problem, also called the subgraph isomorphism search problem.

This subgraph isomorphism search problem involves identifying instances of a smaller graph within a larger graph – finding a needle in a haystack. While this problem holds immense potential for diverse applications, its complexity arises out of it being NP-complete, making it incredibly challenging to find efficient (i.e., polynomial-time) solutions. To address the challenge, exact heuristic solutions and inexact approximate solutions have been proposed in the literature. In my research, I am focusing on exact heuristic solutions for the subgraph isomorphism problem.

There are three objectives in my PhD research.

Objective 1: Dynamic Heuristic Selection

One of my core objectives revolves around the dynamic selection of heuristics based on the characteristics of the pattern and data pair. I aim to develop an intelligent system that can determine the most suitable heuristic before the initiation of the search process itself. This adaptive approach could significantly enhance the efficiency of the search process and potentially outperform existing solutions.

Objective 2: Vector-Based Embedding

To enhance efficiency, I aim to propose a novel heuristic that relies on vector-based embeddings of graph nodes. I hypothesise that this departure from conventional scalar-based heuristics would enable the encapsulation of richer information (i.e., context) within the embeddings. By leveraging this additional context, I hope to enable accurate search space reduction (filtering) and early-stage matches (ordering), thereby reducing the need for backtracking and accelerating the matching process.

Objective 3: Parallelism and Optimization

My final objective focuses on the application of parallelism to both the previous objectives. While parallelism holds immense potential for speedup, it also introduces challenges such as load balancing and synchronization. My aim is to focus on shared memory parallelism while addressing concurrent potential pitfalls and, if necessary, exploring the realm of distributed memory systems.

The presentation outlines a comprehensive work plan for my research journey, marked by milestones and deadlines. My initial focus lies on completing experimentation and crafting a conference paper that showcases the outcomes of my pursuit of objectives.

My PhD project aims to enhance the efficiency and effectiveness of graph analytics, specifically in the context of subgraph isomorphism search. By adopting heuristics dynamically, embracing vector-based embeddings, and harnessing parallelism, I aim to contribute to the understanding of subgraph isomorphism search.

Leave a Reply

Your email address will not be published.