MS-BioGraphs Validation

Repository

https://github.com/DIPSA-QUB/MS-BioGraphs-Validation

Explanation

We provide a Shell script, validation.sh, and a Java program, EdgeBlockSHA.java, to verify the the correctness of the graphs. Each graph has a .ojson file whose shasum is verified by the value retreived from our server. Files such as offsets.bin, wcc.bin, n2o.bin, trans_offsets.bin, and edges_shas.txt have shasum records in the ojson file which is used for validation of these files.

The graph in WebGraph format has been compressed in MS??-underlying.* and MS??-weights.* files. In order to validate the compressed graph, the EdgeBlockSHA.java is used. It is a parallel Java code that uses the WebGraph library to traverse the graph and calculate the shasum of blocks of edges (endpoints and weights). Then, the calculated results are matched with the edges_shas.txt file of the graph.

It is also possible to validate some particular blocks by matching the calculated shasum with the relevant row in the edges_shas.txt file. This file has a format such as the following. Each block contains 64 Million consecutive edges. The start of each block is identified by a vertex ID and its edge index. The Column endpoint_sha is the shasum of the 64 Million endpoints when stored as an array of 4-Bytes elements in the binary format and in the little endian order. Similarly, Column weights_sha shows the shasum of weights (labels). We have separated weights from endpoints as in some applications weights are not needed and therefore it is not necessary to read and validate them.

64MB blk#;     vertex; edge index;                             endpoint_sha;                              weights_sha;
         0;          0;          0; 509784b158cb9404241afb21d0ceaf590b88d2f2; 57da4ad7bb89c5922e436b0535d791fa8f40dffd;
         1;    2315113;        705; fafc118563c1d7b5fbff64af56edd6a56524f479; 13b7a9ca60bfb0715d563218d0a1cd787b00a07c;
         2;    4521625;        597; 4ed65aa07c8062a151166ef2e9bdb93e41d19357; 8158276bec426ee46eca9912759eb9bd57fcc957;
         3;    6347361;        112; d02e8913c807c3f4ecde9c638e0ded5ab80ba819; 26bc3296de65cba6ac539cd96b79ae6f7a4d37be;
         4;    8447869;         15; 61513c84db40124496cdf769516118b63598914f; 781b9f4372ac614e94d097017c756d015234deb6; 
 

Requirements

  • JDK with version > 15
  • jq
  • wget

WebGraph Framework

Please visit https://webgraph.di.unimi.it .

ParaGrapher Graph Loading API and Library

The WebGraph formats can also be read using the ParaGrapher library: https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

License

Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an “as is” basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied.

Copyright 2022-2023 The Queen’s University of Belfast, Northern Ireland, UK

MS-BioGraphs

Related Posts