Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs

—Various concurrent algorithms have been proposed in the literature in recent years that mostly focus on the disjoint set approach to the Connected Components (CC) algorithm. However, these CC algorithms do not take the skewed structure of real-world graphs into account and as a result they do not beneﬁt from common features of graph datasets to accelerate processing. Weinvestigate the implications of the skewed degree distribution of real-world graphs on their connectivity and we use these features to introduce Thrifty Label Propagation as a structure-aware CC algorithm obtained by incorporating 4 fundamental optimization techniques in the Label Propagation CC algorithm. Our evaluation on 15 real-world graphs and 2 different processor architectures shows that Thrifty accelerates the ﬂow of labels and processes only 1 . 4% of the edges of the graph. In this way, Thrifty is up to 16 × faster than state-of-the-art CC algorithms such as Afforest, Jayanti-Tarjan, and Breadth-First Search CC. In particular, Thrifty delivers 1 . 5 × − 19 . 9 × speedup for graph datasets larger than one billion edges.

Algorithms for finding connected components in a graph can be placed in one of three classes: 1) Flood Filling CC [14], [15], [16], [17] performs breadthfirst search (BFS) or depth-first search (DFS) to identify all vertices that are reachable from a chosen starting point. A BFS/DFS search is required for each component. 2) Label Propagation CC (LP-CC) [18] iteratively updates the label of each vertex by calculating the minimum value between labels of its neighbours. Label Propagation CC can be specified in terms of generalized Sparse Matrix-Vector (SpMV) multiplication. 3) Disjoint Set CC [19], [20], [21], [22] uses the disjoint set data structure to group connected vertices in the same set.
As sets are often represented as trees, these algorithms have also been called "tree-hooking" algorithms [23].
Some studies have identified that Disjoint Set CC is more efficient than Flood Filling and Label Propagation [22], [24]. This is especially true for graph datasets with high diameters [25], such as road networks. Moreover, in the case of graphs with skewed degree distribution, such as those found in social networks and web graphs, Disjoint Set CC minimizes the number of times each edge is processed. Jayanti and Tarjan process each edge just once [21] while Afforest processes each edge on average slightly more than once [22]. However, Disjoint Set CC is not scalable and has not been effective in distributed processing [26]. In contrast, the Label Propagation CC follows a SpMV model that has been successfully scaled to distributed systems [27], [28], [29].
In this paper, we present a new perspective on the CC algorithm by investigating the implications imposed by real-world graphs on the efficiency of the label propagation process of the LP-CC algorithm. Many real-world graphs derived from social networks, the internet, and the world-wide web show a heavytailed skewed degree distribution. In other words, a very small fraction of the vertices are connected to a disproportionately large fraction of edges. This particular relationship between vertices merits special attention.
This paper is structured as follows: Section II explains key background materials. Section III explains the main inefficiencies in the Label Propagation algorithm in processing real-world graphs with power-law degree distribution, and Section IV introduces four optimization techniques to solve these problems. Section V evaluates our new Thrifty algorithm and Section VI discusses further related work. Future work is discussed in Section VII.

II. BACKGROUND
A simple graph or undirected graph G = (V, E) has a set of vertices V , and a set of edges E between these vertices. Edges are unordered pairs of elements of V . N v is the set of neighbours of vertex v. We consider algorithms for static graphs, which are immutable during the evaluation of the algorithms.
We represent undirected graphs using a compressed sparse (rows or columns) representation [31]. This is a compact representation that is generally assumed in graph processing. A drawback of this representation is that each edge is represented twice: once pointing from a vertex to its neighbour, and once pointing back from the neighbour to the vertex. This representation simplifies information flow across edges in both directions. Afforest also assumes this representation in support of sampling edges incident to specific vertices [22]. Some algorithms, like the Jayanti and Tarjan algorithm, operate correctly on a coordinate representation, where each edge appears precisely once [21].
A frontier F is a data structure that represents a set of active vertices F.V and a set of active edges F.E that is induced by the vertex set: Frontiers may be implemented as worklists (listing specifically the active vertices in F.V ), or as a bitmap or boolean array (storing a boolean value for each vertex v ∈ V that indicates if v ∈ F.V ). Graph processing systems dynamically switch between these representations depending on the density of the frontier, i.e., the number of vertices and edges it contains compared to the size of the graph [32].
The principle of LP is that each vertex is initially assigned a unique integer label. Each vertex subsequently compares its label to the labels of its neighbours and updates its label to swap(new f r, old f r); 24 while |old f r|; be the smallest among them. This process is repeated for all vertices in the graph during one iteration of the algorithm. Subsequent iterations repeat this process until no further changes are made to the labels. The initially assigned labels can be chosen freely, as long as each vertex has a distinct label.

A. Direction Optimizing Label Propagation
The direction optimizing graph traversal selects push or pull traversal based on the number of vertices and edges that should be processed [33], [34]. Direction Optimizing Label Propagation (DO-LP) has been implemented in different graph processing frameworks including [35], [25], [28], [36], [37], [38], [39]. We present here a version of the algorithm that is broadly considered as the state of the art Label Propagation algorithm (Algorithm 1).
DO-LP maintains two arrays to store labels of vertices: old lbls holds the labels derived during the previous iteration, while new lbls holds the updated labels calculated in the current iteration. DO-LP uses two frontiers to manage active vertices in each iteration: (1) the new f r collects vertices New and old labels of each vertex are initialized by the vertex ID (Lines 2-4), and then CC iterations are started by identifying the traversal direction using the density of the frontier (Lines 6-7). Values like 1 15 , 1 18 [34], and 5% [35], [25] are often used as density threshold.
In a sparse iteration, a push traversal is performed: for each vertex v in the frontier, all neighbours are checked. If the new label of a neighbour is greater than the old label of v (Line 10), the neighbour's new label is updated and the neighbour is submitted to the new f r to be processed in the next iteration. The atomic min() uses compare_and_swap() to perform an atomic write of old lbls v to new lbls u , if new lbls u is lower than old lbls v . The atomic min() returns the result of comparison that states if new lbls u has been modified by this function.
In a dense iteration, a pull traversal is executed (Lines [13][14][15][16][17][18][19][20]. The new label of a vertex is identified as the minimum value between the old labels of the vertex and its neighbours (Lines [16][17]. While the pull iteration calculates a new frontier new f r, it does not consult whether its neighbours are present in old f r. This is correct as all labels are valid values and improves performance by reducing memory accesses. An iteration is finished by updating the old labels of vertices to their new values (Lines [21][22]. Iterations are continued as long as the label of at least one vertex is modified (Line 24).

III. DRAWBACKS OF LABEL PROPAGATION IN PROCESSING POWER-LAW GRAPHS
While DO-LP employs various important inventions in high-performance graph processing, several inefficiencies remain. These inefficiencies are specifically important for powerlaw graphs.

A. Repeated Wavefronts
Labels are propagated from one vertex to its neighbours. As a consequence, DO-LP propagates a label over one hop distance during one iteration, and two hops distance during two iterations. This causes a wavefront of updates that ripples through the graph and causes changes to the vertex labels. In this way, after each iteration of the DO-LP, a new wavefront is initiated and this new wavefront follows the previous wavefront at a distance.
The DO-LP initiates a new wavefront only at the end of an iteration, when updated labels are committed (Lines 21 and 22 in Algorithm 1). Moreover, wavefronts can propagate over at most one hop during any iteration. Figure 2 shows an example graph and its label propagation steps. Initially all vertices receive a label as their ID. Following the next iterations, vertices update their labels by comparing labels of their neighbours. Figure 2 shows that DO-LP propagates a label only one hop per iteration. First, label 1 is propagated from vertex B to vertex C and then on to the main part of the graph. This is subsequently repeated by overwriting label 1 with 0. As such, it requires to perform as many iterations as the diameter of the graph (4 iterations) to propagate the lowest label of the component to all vertices of the component. Thus, label propagation in DO-LP is a slow and repetitive process.

B. Preaching to the Converged
The DO-LP algorithm tries to minimize the amount of computation by tracking which vertices had their label changed. In this way, only changed labels are propagated. However, this still causes redundant work. Figure 3 shows the percentage of vertices that are active at the start of pull iterations, as well as the percentage of vertices that have converged to their final value. Converged vertices have reached their final values and do not need to be processed further. Figure 3 demonstrates that convergence is very slow in the first iterations as well as in the final iterations. In between, convergence occurs very quickly with 30-60% of the vertices converging during one iteration. These are the most effective iterations.
However, during these and the following iterations, there is redundant activity: the number of active vertices is high as well as the number of converged vertices. Hence, most of the active vertices will try to propagate their label to a vertex that has already converged. In other words, DO-LP performs excess work by processing all edges of the graph in pull iterations as it is not able to identify if vertices have converged.

C. Inefficient Initial Label Assignment
The propagation of labels is in part driven by the initial label assignment. For instance, in Figure 2, a small label is assigned to the vertex A which is on the fringe of the graph. The label is propagated to the core of the graph over several iterations. During the first iterations, however, other labels are also propagated between vertices like E, D, and C. When A's label reaches C, the traversal inside the core has to be repeated all over.
However, when labels are assigned differently, LP is more efficient. Note the degree of freedom in choosing the initial label assignment: the only constraint is that all vertices initially have distinct labels. If vertex E is initially assigned the smallest label (Figure 4), then the label is first propagated in the core of the network, which then stabilises. The label is subsequently propagated out to vertex E, causing fewer label updates in total. This shows that the initial label assignment affects performance and that structure-oblivious initial label assignment prevents efficient propagation of the labels.

D. Eager Bootstrapping Label Propagation
DO-LP starts with propagating the label of each vertex to its neighbours, as indicated by initializing new f r to contain all vertices. This is necessary initially as we need to compare the label of a vertex at least once to all its neighbours.
However, doing this right at the start is inefficient as very few vertices converge to their final label in the very first iterations ( Figure 3).
The cause of this inefficiency can be found in the fact that most vertices have a large label. However, in skeweddegree graphs, most vertices also have neighbours with large labels. As such, in the first iterations of the DO-LP, there is little opportunity to reduce the magnitude of the label significantly. Even worse, most updates will be overridden by future wavefronts carrying smaller labels. As such, the initial pull iterations of the DO-LP are work inefficient.

IV. THRIFTY LABEL PROPAGATION
We introduce 4 optimization techniques to address the inefficiencies of DO-LP described above. These are implemented in Algorithm 2.

A. Unified Labels Array
DO-LP incurs a slow label propagation, with a wavefront progressing at most one hop per iteration (Section III-A). We address this by employing a Unified Labels Array that uses only one array for labels, as opposed to different arrays for the old and new labels. In this way, updated labels can be propagated already within the same iteration as they are calculated, simply by reading the values from the same array (or memory location) as they were written to.
Section V-C1 shows that the number of iterations is reduced by up to 89% and on average by 39%, as a result of accelerating label propagation by using one labels array.
By using one labels array, the label arrays' synchronization in Lines 21-22 of the Algorithm 1 is removed in Algorithm 2. This reduces the execution time of sparse push iterations significantly.

B. Zero Convergence
DO-LP is prone to processing many vertices that have already converged on their final label (Section III-B). As such, we desire to recognise when vertices have converged and, once they have converged, we skip processing these vertices. But how can we know if a vertex has converged? Hereto, we make two observations. Firstly, the LP algorithm performs the "minimum" arithmetic operation on the labels of each pair of neighbouring vertices. The arguments to the minimum operation are the labels, which are integers. It is important to note that the LP algorithm does not create new labels. It only copies over labels from one vertex to another, in such a way that larger labels are overwritten by smaller labels. As such, we know that the smallest label that any vertex can ever obtain is the same as the smallest label in the initial assignment of labels. In our case, that is zero. As such, we can safely assume that any vertex holding a zero label has converged -it cannot be updated further.
Our second observation answers the question: Are there many vertices that will converge to the zero label? The answer is based on the high connectivity of vertices in real-world skewed-degree graphs. Table I shows the percentage of vertices of each dataset that are in the largest component. It shows that more than 94% of vertices of power-law graphs are connected to each other. This corresponds to the notion of a giant component, which forms naturally in skewed-degree graphs [40]. Thus, more than 94% of the vertices can converge to the zero value provided that the zero label is assigned to a vertex in the giant component. Moreover, assuming that initial labels are assigned uniformly at random, the zero label will be assigned to the giant component with a probability of 94%.
The Zero Convergence optimization is implemented by adding two branches in Lines 24 and 31 of the Algorithm 2, which check if the vertex has converged to zero. If so, we do not need to process the vertex further. Moreover, the branch at line 31 implies that processing of a vertex terminates immediately, as soon as its label becomes zero.
Section V-C2 shows that Zero Convergence tremendously reduces the total processed edges: on average, DO-LP processes each edge 7.7 times, while Thrifty processes only 1.4% of the edges.

C. Zero Planting
In Section III-C we explained that DO-LP assigns initial labels inefficiently which results in long propagation paths and repeated wavefronts. To solve this, we need to ensure that the smallest label is initially placed in the core of the graph, and not on the fringes. Considering also the Zero Convergence optimization, we should maximize the chance that the zero label is assigned to the giant component. This is captured in the Zero Planting technique.
We employ a simple heuristic to plant the zero label at the start of the algorithm, namely to plant it in the vertex with the highest degree. The rationale is two-fold: In a skewed-degree graph with a giant component, the highest-degree vertex is almost certainly a member of the giant component (if not, the component containing the highest-degree vertex cannot be giant). Secondly, the highest-degree vertex is likely a hub vertex, i.e., it has a high centrality within the graph. As such, it is few hops away from the other vertices in the same component.
The Zero Planting technique is implemented in Lines 3-9 of the Algorithm 2. The label of vertex v is initialized by v + 1 (instead of v), and the zero label is reserved for the vertex with the maximum degree. In Lines 5-7, each parallel thread (with ID thread id) finds its local maximum degree and the vertex with the maximum degree (between maximum degrees reported by threads) receives the zero label in Line 9.
Section V-C3 shows that the Zero Planting technique provides a very fast convergence rate of 88% of the vertices after the first pull iteration as a result of removing or cutting short those iterations that are required to propagate the label zero to the hub vertices.

D. Initial Push
We observed that DO-LP starts off poorly in the first iterations as the vast majority of the labels that are propagated in these iterations will later be overwritten (see Section III-D). As such, it starts off too aggressively with pull iterations that propagate labels along all edges. However, identifying which labels are not worth propagating is non-trivial in DO-LP.
The Zero Planting optimization enables Thrifty to selectively propagate labels in the initial iterations. In Thrifty, the goal is to make the giant component converge to label zero. As such, we are initially only interested in propagating the zero label. Once the zero label has propagated to a sufficient number of highly connected vertices, it can propagate much more quickly through the giant component. At this stage a full-blown pull iteration (with zero convergence) becomes effective, and will also effect label propagation through the other components. Note that the other components are tiny and hardly contribute to execution time.
The Initial Push technique states that the best traversal in the first iteration is a push traversal of the zero label from the vertex with the maximum degree to its neighbours. This push traversal propagates the zero label as much as possible without imposing the cost of processing all edges. The initial push traversal is shown in Lines 11-12 of Algorithm 2.
Thrifty performs only one initial push iteration. This is optimal due to the typical structure of graphs with skewed degree distribution where many high-degree vertices are connected to other high-degree vertices and have many common neighbours. A first initial push iteration thus propagates the zero label to a good number of high-degree vertices, and a second push iteration would traverse many high-degree vertices with many common neighbours. This would replicate much of the work of propagating the zero label. In this way, a pull traversal is more efficient [33] especially with zero convergence check.
Section V-C4 shows that the Initial Push technique accelerates the execution time of the first iteration by 5.3×.

E. Thrifty Implementation and Data Structures
In this section we present more details of efficient implementation of Thrifty.
In Line 16 of the Algorithm 2 we use new threshold to select push or pull traversal. By applying the convergence optimizations that significantly reduce the execution time of pull traversals, we identified 1% acts best as a threshold between push and pull traversals. We evaluate the effect of the threshold in Section V-E.
To accelerate pull iterations, we do not collect a detailed frontier listing all active vertices. At the end of most pull iterations, it suffices to know whether the frontier is dense or sparse. As such, we count active vertices but do not record which vertices are active. In the final pull iteration, prior to switching to sparse iterations, a detailed frontier is necessary. When Thrifty decides to switch to push traversal, it performs a Pull-Frontier iteration, which is a pull iteration that also identifies which vertices are active.
Push iterations are sparse and, as such executed quickly. Some web graphs like UK-Union and WebBase-2001 have more than 70 push iterations, and it is necessary to optimize the push iterations. To this end, it is necessary to select data structures carefully [41]. We assign local worklists to each thread to collect its active vertices. We also use a shared byte array between threads that shows if a vertex has been previously added to the local worklist of any thread. The byte array is written and read by all threads and the local worklists are only written by their specific threads but are read by all threads. We do not use atomic instructions to access the shared byte array. In the case that one vertex will be added to two local worklists due to a race condition, then that vertex may be processed twice in the next iteration. This does not affect the correctness of the algorithm. Each thread starts by processing its local worklist and after that steals vertices from the worklists of other threads.

F. Correctness of The Thrifty Algorithm
The Thrifty algorithm uses four optimizations and in this section we show these optimizations do not change the correctness of the algorithm.
The Unified Labels Array technique uses one label array for storing labels. It affects Lines 11 and 16 of the Algorithm 1 where new lbls should be read instead of old lbls. We assume vertex v reads old lbls for all of its neighbours except neighbour n. Reading the new lbls n can change the correctness of the algorithm only if v can not find any label less than new lbls n in the current iteration. In this case v will read new lbls n in the next iteration of DO-LP. This shows that reading new lbls n instead of old lbls n in the current iteration does not affect the correctness of the algorithm.
The Zero Convergence technique inserts comparisons to zero to the DO-LP to stop processing edges when reaching the zero label. As zero is the minimum value among all labels, no changes can be applied to a label that has reached zero. This shows that the Zero Convergence technique stops a process that can not change the label of a vertex. In other words, the Zero Convergence technique does not change the correctness of the algorithm.
The correctness of DO-LP is independent of the initial label assignment as long as vertices receive unique initial labels. Therefore the Zero Planting technique that plants the zero label in the vertex with maximum degree does not change the correctness of the algorithm.
The Initial Push technique can be considered as the application of a different schedule, i.e., the zero label is propagated over one hop before considering other updates. The correctness follows from the same argument as Unified Label Arrays.

A. Machines and Datasets
We use two different machines listed in Table III for evaluation. The machines use CentOS 7. We use an optimized implementation of CC in the C language that deploys the pthread, libnuma, and papi [42] libraries. We use the interleaved NUMA memory policy and apply work-stealing [43] for parallel processing of graph partitions created by vertex and edge partitioning [44], [25].
We create 32 * #threads edge balanced partitions and partitions [32 * t, 32 * (t + 1)) are initially assigned to thread t. A thread processes its own partitions and then steals partitions from threads on the same NUMA node and finally from threads on other NUMA nodes. In order to preserve locality in processing consequent partitions and increase reuse of cache contents, a thread processes its own partitions in ascending order and steals partitions from other threads in descending order.
For comparison to other CC algorithms, we use BFS-CC implemented in GraphGrind [30], [25] (commit 5099761), and the Shiloach-Vishkin and Afforest implementations in GAP [56] (commit 6ac1afd). Table IV compares Thrifty to the prior state-of-the-art CC algorithms. For road networks (GBRd and USRd) that do not follow a power-law degree distribution SV, JT and Afforest are faster than Thrifty. For graphs larger than LiveJournal, Thrifty has the best results on both architectures: on the SkyLake machine, Thrifty provides up to 3.9× and on average 1.6× speedup to Afforest, and 8.4 × −54.6× speedup compared to the SV, JT, and BFS-CC algorithms. On the EPYC machine Thrifty provides 1.5× speedup over Afforest and 7.3 × −65.3× over SV, JT, and BFS-CC algorithms.

B. Comparison to Prior State of the Art
The importance of Thrifty is not limited to having faster execution time in comparison to Disjoint Set algorithms. Disjoint Set algorithms like SV, Afforest, and JT are concurrent algorithms that do not scale to distributed memory systems. One attempt at distributed disjoint sets notes lack of scalability and net performance loss compared to sequential algorithms [26]. In contrast, the SpMV model of the Label Propagation algorithm allows successful scaling in distributed systems [28], [29].   A second limitation of the Disjoint Set algorithms originates from the fact that concurrent algorithms are very specific solutions to a problem and require great precision in their design and implementation [57]. Concurrent algorithms have limited potential to generalize to other problems. In contrast, the Thrifty algorithm that follows a SpMV model is more generic and conceptually simple. Numerous frameworks have been defined that present a reusable interface and hide numerous performance optimizations behind that interface, out of the concern of the user [27], [28], [29], [37], [58], [39].

C. Has Thrifty Reached Its Goals?
In this section we consider more details of the execution of Thrifty to identify if it has reached the goals we explained in Section IV. In the experiments of this section when we refer to iterations of Thrifty, we count the Initial Push as an iteration.
1) Faster Label Propagation and Reducing Number of Iterations: To facilitate faster propagation of the labels, we suggested using the Unified Labels Array technique and Table V shows that Thrifty reduces the total iterations by 39%, on average. For WebBase-2001, Thrifty reduces total iterations by 89%.
2) Work Reduction: Figure 5 compares the speedup provided by Thrifty in comparison to DO-LP. It also shows the percentage of edges of graphs that are processed by the Thrifty and DO-LP algorithms. It shows that Thrifty reduces the total traversed edges by at least 97%. In fact, Thrifty processes up to only 4.4% of the edges of the graph which shows the Zero Convergence can significantly reduce the total work. Figure 6 also compares Thrifty to DO-LP for reduction in (1) the last level cache misses, (2) memory accesses (load and store memory instructions), (3) branch mis-predictions, and (4) hardware instructions. It shows that Thrifty cuts at least 80% of the redundant work done by DO-LP.

E. Effect of The Threshold
To explain the effect of threshold for selecting push and pull traversals (Section IV-E),    of the first iterations of the Friendster dataset on the Epyc machine. By using 1% as the threshold, iteration 4 is performed in the pull direction that requires 5ms but it is 20ms in the push direction. It shows that the Zero Convergence optimization is able to accelerate the pull iterations which makes it harder for the push traversal to compete. Therefore, it is necessary to have a smaller threshold.

VI. RELATED WORK
The effectiveness of push and pull traversals for different graph analytics is discussed in [59], [33], [34], and the differences between locality of push and pull traversals have been studied in [60]. iHTL [61] optimizes temporal locality by applying push and pull directions in one graph traversal but for different types of vertices. In order to improve the performance of the CC algorithm, we have used the direction optimizing CC as the baseline. DO-LP selects push traversal for sparse iterations (where a small number of vertices are in the frontier) and applies pull traversal for dense iterations.
The Shiloach-Vishkin CC algorithm [19] is the first Disjoint Set CC. It makes a number of iterations, each of which makes a pass over the graph. Each iteration is started by a hook phase that attaches roots of subgraphs based on edges and is followed by a shortcut phase that updates the label of each vertex of subgraphs by the label of its root. BFS is used in [62] to accelerate SV.
FastSV [63] improves performance of LACC [64], however, both algorithms use MIN operator over labels to decide if the label of a vertex should be changed. It makes these algorithms variants of the Label Propagation CC instead of SV. The main difference between LP and SV algorithms is in the observations that result in changing the label of a vertex: SV-based algorithms consider the topology of a graph (i.e., edges between vertices) to change their labels, while LP-based algorithms consider the label of a vertex in relation to labels of its neighbours to identify its new label. Shortcutting technique is used in [65] to accelerate the label propagation CC.
The Jayanti-Tarjan [21] optimizes the SV algorithm based on a linearizable randomized linking strategy. It requires only one traversal of the graph. Afforest [22] uses sampling to reduce the total number of processed edges. ConnectIt [24] extends Afforest by combining various sampling methods with various CC algorithms. We attempted to evaluate ConnectIt but its code repository was under modification and could not be compiled (we are communicating on this with the authors).

VII. CONCLUSION AND FUTURE WORK
The conceptual simplicity, in conjunction with the uncontested dominance of Label Propagation in distributed memory systems, prompts us to revisit Label Propagation in shared memory systems. We developed performance optimization techniques to improve the DO-LP algorithm based on the features implied by the structure of real-world graph datasets that follow a scale-free degree distribution: 1) The Unified Labels Array technique reduces the number of iterations by 39%. 2) The Zero Convergence technique reduces the number of processed edges of the graphs to 1.4% of the edges, on average. 3) The Zero Planting technique provides fast convergence of 88% of the vertices after the first pull iteration. 4) The Initial Push technique accelerates the initial iteration of label propagation by 5.3×, on average. We introduced Thrifty Label Propagation, which deploys these techniques. Our evaluation of Thrifty against state-ofthe-art CC algorithms on two different processor architectures shows 1.4× speedup over Afforest, 7.3× over Jayanti-Tarjan, 14.7× over BFS-CC, 51.2× over SV. The Thrifty algorithm is faster than the Direction Optimizing Label Propagation by 25.2×.
An important question for future work is how Thrifty applies in a distributed processing setting, where label propagation algorithms are the norm. We plan to apply Thrifty to a distributed processing model like KLA [38]. Moreover, the unordered scheduling of the vertices based on the KLA model can be used in a shared memory system to provide better CPU utilization.
The optimization techniques we expressed for the Label Propagation algorithm are not strictly limited to connected components. In future work we will investigate how these can be generalized to other algorithms expressed in the SpMV model of graph processing. In particular, we wish to explore the connection between the unified arrays optimization and asynchronous execution. CODE AVAILABILITY Source code repository and further discussions relating to this paper are available online in https: //blogs.qub.ac.uk/graphprocessing/Thrifty-Label-Propagation-Fast-Connected-Components-for-Skewed-Degree-Graphs/.