{"id":2367,"date":"2023-08-10T09:40:00","date_gmt":"2023-08-10T08:40:00","guid":{"rendered":"https:\/\/blogs.qub.ac.uk\/dipsa\/?p=2367"},"modified":"2024-06-20T16:29:18","modified_gmt":"2024-06-20T15:29:18","slug":"ms-biographs-validation","status":"publish","type":"post","link":"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-validation\/","title":{"rendered":"MS-BioGraphs Validation"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Repository<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/DIPSA-QUB\/MS-BioGraphs-Validation\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>https:\/\/github.com\/DIPSA-QUB\/MS-BioGraphs-Validation<\/strong><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"#explanation\"><\/a>Explanation<\/h2>\n\n\n\n<p style=\"text-align:justify\">We provide a Shell script, <code>validation.sh<\/code>, and a Java program, <code>EdgeBlockSHA.java<\/code>, to verify the the correctness of the graphs. Each graph has a <code>.ojson<\/code> file whose <code>shasum<\/code> is verified by the value retreived from our server. Files such as <code>offsets.bin<\/code>, <code>wcc.bin<\/code>, <code>n2o.bin<\/code>, <code>trans_offsets.bin<\/code>, and <code>edges_shas.txt<\/code> have shasum records in the <code>ojson<\/code> file which is used for validation of these files.<\/p>\n\n\n\n<p><p style=\"text-align:justify\">The graph in WebGraph format has been compressed in <code>MS??-underlying.*<\/code> and <code>MS??-weights.*<\/code> files. In order to validate the compressed graph, the <code>EdgeBlockSHA.java<\/code> is used. It is a parallel Java code that uses the WebGraph library to traverse the graph and calculate the shasum of blocks of edges (endpoints and weights). Then, the calculated results are matched with the <code>edges_shas.txt<\/code> file of the graph.<\/p><\/p>\n\n\n\n<p><p style=\"text-align:justify\">It is also possible to validate some particular blocks by matching the calculated shasum with the relevant row in the <code>edges_shas.txt<\/code> file. This file has a format such as the following. Each block contains 64 Million consecutive edges. The start of each block is identified by a vertex ID and its edge index. The Column <code>endpoint_sha<\/code> is the <code>shasum<\/code> of the 64 Million endpoints when stored as an array of 4-Bytes elements in the binary format and in the little endian order. Similarly, Column <code>weights_sha<\/code> shows the <code>shasum<\/code> of weights (labels). We have separated weights from endpoints as in some applications weights are not needed and therefore it is not necessary to read and validate them.<br><\/p><\/p>\n\n\n\n<pre class=\"wp-block-code\">64MB blk#;     vertex; edge index;                             endpoint_sha;                              weights_sha;\n         0;          0;          0; 509784b158cb9404241afb21d0ceaf590b88d2f2; 57da4ad7bb89c5922e436b0535d791fa8f40dffd;\n         1;    2315113;        705; fafc118563c1d7b5fbff64af56edd6a56524f479; 13b7a9ca60bfb0715d563218d0a1cd787b00a07c;\n         2;    4521625;        597; 4ed65aa07c8062a151166ef2e9bdb93e41d19357; 8158276bec426ee46eca9912759eb9bd57fcc957;\n         3;    6347361;        112; d02e8913c807c3f4ecde9c638e0ded5ab80ba819; 26bc3296de65cba6ac539cd96b79ae6f7a4d37be;\n         4;    8447869;         15; 61513c84db40124496cdf769516118b63598914f; 781b9f4372ac614e94d097017c756d015234deb6; \n \n\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"#requirements\"><\/a>Requirements<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>JDK<\/code> with version &gt; 15<\/li>\n\n\n\n<li><code>jq<\/code><\/li>\n\n\n\n<li><code>wget<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"#license\"><\/a>WebGraph Framework<\/h2>\n\n\n\n<p>Please visit <a href=\"https:\/\/webgraph.di.unimi.it\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/webgraph.di.unimi.it<\/a> .<\/p>\n\n\n\n<p><strong>ParaGrapher Graph Loading API and Library<\/strong><\/p>\n\n\n\n<p>The WebGraph formats can also be read using the ParaGrapher library: <a href=\"https:\/\/blogs.qub.ac.uk\/DIPSA\/ParaGrapher\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/blogs.qub.ac.uk\/DIPSA\/ParaGrapher\/<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">License<\/h2>\n\n\n\n<p><p style=\"text-align:justify\">Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an &#8220;as is&#8221; basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied.<\/p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong><a href=\"#copyright-2022-2023-the-queens-university-of-belfast-northern-ireland-uk\"><\/a>Copyright 2022-2023 The Queen&#8217;s University of Belfast, Northern Ireland, UK<\/strong><\/h4>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong><a rel=\"noreferrer noopener\" href=\"https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs\/\" target=\"_blank\">MS-BioGraphs<\/a><\/strong><br><br><strong>Related Posts<\/strong><\/p>\n\n\n<ul class=\"wp-block-latest-posts__list has-dates wp-block-latest-posts\"><li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/08\/trees-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/minimum-spanning-forest-of-ms-biographs\/\">Minimum Spanning Forest of MS-BioGraphs<\/a><time datetime=\"2024-08-09T14:11:36+01:00\" class=\"wp-block-latest-posts__post-date\">9 August 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/04\/ivy-2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-on-ieee-dataport\/\">MS-BioGraphs on IEEE DataPort<\/a><time datetime=\"2024-04-17T07:26:23+01:00\" class=\"wp-block-latest-posts__post-date\">17 April 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/02\/poplar2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/paragrapher-source-code-for-webgraph-types\/\">ParaGrapher Source Code For WebGraph Types<\/a><time datetime=\"2024-02-16T08:13:13+00:00\" class=\"wp-block-latest-posts__post-date\">16 February 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/11\/goldcrest-1-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/on-overcoming-hpc-challenges-of-trillion-scale-real-world-graph-datasets\/\">On Overcoming HPC Challenges of  Trillion-Scale Real-World Graph Datasets \u2013 BigData&#8217;23 (Short Paper)<\/a><time datetime=\"2023-12-15T02:47:00+00:00\" class=\"wp-block-latest-posts__post-date\">15 December 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/10-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/dataset-announcement-ms-biographs-trillion-scale-public-real-world-sequence-similarity-graphs\/\">Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs &#8211; IISWC&#8217;23 (Poster)<\/a><time datetime=\"2023-10-02T00:26:00+01:00\" class=\"wp-block-latest-posts__post-date\">2 October 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-sequence-similarity-graph-datasets\/\">MS-BioGraphs: Sequence Similarity Graph Datasets<\/a><time datetime=\"2023-08-30T06:52:00+01:00\" class=\"wp-block-latest-posts__post-date\">30 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/1-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms\/\">MS-BioGraphs MS<\/a><time datetime=\"2023-08-10T09:53:42+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/6-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa500\/\">MS-BioGraphs MSA500<\/a><time datetime=\"2023-08-10T09:52:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/3-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms200\/\">MS-BioGraphs MS200<\/a><time datetime=\"2023-08-10T09:51:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/7-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa200\/\">MS-BioGraphs MSA200<\/a><time datetime=\"2023-08-10T09:50:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/4-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms50\/\">MS-BioGraphs MS50<\/a><time datetime=\"2023-08-10T09:49:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/8-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa50\/\">MS-BioGraphs MSA50<\/a><time datetime=\"2023-08-10T09:48:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/9-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa10\/\">MS-BioGraphs MSA10<\/a><time datetime=\"2023-08-10T09:44:41+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/5-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms1\/\">MS-BioGraphs MS1<\/a><time datetime=\"2023-08-10T09:41:14+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/11-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-validation\/\">MS-BioGraphs Validation<\/a><time datetime=\"2023-08-10T09:40:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<\/ul>\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Repository https:\/\/github.com\/DIPSA-QUB\/MS-BioGraphs-Validation Explanation We provide a Shell script, validation.sh, and a Java program, EdgeBlockSHA.java, to verify the the correctness of the graphs. Each graph has a .ojson file whose shasum is verified by the value retreived from our server. Files such as offsets.bin, wcc.bin, n2o.bin, trans_offsets.bin, and edges_shas.txt have shasum records in the ojson file [&hellip;]<\/p>\n","protected":false},"author":1315,"featured_media":2371,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[63],"tags":[116,67,35,38,64,66,65,19],"class_list":{"0":"post-2367","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ms-biographs","8":"tag-biological-networks","9":"tag-graph-datasets","10":"tag-graph-processing","11":"tag-high-performance-computing","12":"tag-high-performance-graph-processing","13":"tag-real-world-graphs","14":"tag-sequence-similarity-graphs","15":"tag-source-code","16":"czr-hentry"},"jetpack_featured_media_url":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/11.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2367","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/users\/1315"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/comments?post=2367"}],"version-history":[{"count":21,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2367\/revisions"}],"predecessor-version":[{"id":3045,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2367\/revisions\/3045"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media\/2371"}],"wp:attachment":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media?parent=2367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/categories?post=2367"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/tags?post=2367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}