Application-Aware Relaxed Synchronisation for Distributed Graph Processing

By Marco D’Antonio

Graph analytics are algorithms used to extract information from large bodies of data organised in a graph structure. But what is a graph?
Imagine Facebook: people create their own profiles and can then add other people as friends. These friendship relationships can be seen as links that connect people. The same principle is applicable to Instagram, Twitter (or X, as you prefer). On Twitter, we can not only represent the follower relationship as links between profiles but also retweets as links between posts. Other examples of relationships are hyperlinks between web pages, roads connecting intersections in the real world, and bonds connecting atoms at a molecular level.

All these examples share a common ground: they can be represented as entities linked by some kind of relationship, physical or otherwise. This kind of representation is a graph. Once a graph is built, we can apply algorithms to it to extract information. For instance, algorithms can be applied to detect the amount of influence a person has over the flow of information in a social network, to calculate the importance of a web page in a search engine depending on the number of links that point to it, or to compute the shortest route to get to a destination in a live map application. These algorithms are called graph analytics.

In the era of Big Data, companies gather large volumes of data and want to extract meaningful information from it. However, these datasets can contain trillions of relationships, which do not allow traditional algorithms to provide a timely result. For this reason, large-scale distributed systems or high-performance servers are used to speed up the process of finding an answer to graph queries. To provide scalable performance, however, having better hardware is not enough: the algorithms must be designed so that they can scale and efficiently use all available resources. Algorithms must therefore use multiple processing elements available in modern CPUs, GPUs or multiple servers connected by a network. This kind of computation is called parallel computation, which can be concurrent whenever shared resources are involved.

Think about two people ordering a bookcase in alphabetical order. They have a big heap of books, but they will usually be faster than a single person doing the same amount of work. At the same time, there might be times when both will have to put a book in the same position, at that moment, they will have to coordinate to preserve the correct order.

This coordination is also found in parallel algorithms, and it is the price to pay for parallelism. In fact, while some problems are easier to run on multiple processing elements, namely embarrassingly parallel problems, others require much more coordination. Parallel graph analytics fall into this last category.

Coordination, which is technically called synchronisation, does not generally help to get faster performance but is needed for algorithms to be correct. However, synchronisation can be applied at different levels, in a fine- or coarse-grained manner.

Imagine a road intersection: if the road is always busy, having traffic lights will always allow a large number of cars to get through the intersection, therefore being an efficient method for coordination. However, if the intersection is on a countryside road, having traffic lights is not efficient, it is much better to let car drivers handle the right of way.

My project aims to move from coarse-grained synchronisation to more fine-grained synchronisation in parallel graph analytics to improve performance gains.

This is not an easy task, because changing this increases the complexity of the implementation of graph analytics. Usually, most implementations employ the Bulk Synchronous Parallel (BSP) model, in which the computation is organised in steps interleaved with a synchronisation.

Imagine that a group of friends is trying to solve a puzzle together. The puzzle works in stages and each friend is assigned a section of the puzzle. Friends cannot proceed to the next step until all others have finished their stage. Once they have all finished the stage, they share their information with each other, because it will be needed in the next stages. Then, the process repeats, and each friend starts working on the next stage.

This is how BSP works, each friend of the example is a processor, which is working on a piece of a large computational task, the puzzle. They all work independently, but they must synchronise to share the results and to ensure everything is correct.

My current work focuses on developing a different way to coordinate the friends’ efforts’ together to solve the puzzle.

We can change the rules of the game a little bit. Now, once a friend finishes a stage, they can either help a friend struggling with their part or proceed to the next stage of the puzzle. Information is now exchanged only when needed, however, this means that friends who have gone too far ahead might get “old” information from their friends who are at an earlier stage.

In technical terms, this new coordination mechanism between friends introduces two concepts: work stealing and speculative execution. Work stealing happens when a friend helps another friend, this can happen because one part of the puzzle is more complicated than others. In terms of graphs, this usually means that a certain processor has more work than others, usually due to entities with a high number of links; for example, a celebrity in a social network will have many more followers than an average person. Speculative execution happens when a friend proceeds to the next stage of the puzzle without waiting for others. The problem with doing this is that the information from other friends is needed, but it might not be the more up-to-date. However, the key here is that even if the processor is doing work that does not contribute towards the solution – although that is not always the case, it is still better than being idle.

This proposed mechanism moves from the synchrony of BSP to speculative execution, which is called asynchronous execution: each processor can proceed in computation and communicate with others only when necessary. However, in my approach, I also use work stealing as a way to share work and help the so-called straggler processors and automatically balance work between processors.

Tagged on: Blog

By Marco D’Antonio

Leave a Reply Cancel reply