{"id":2606,"date":"2023-10-02T00:26:00","date_gmt":"2023-10-01T23:26:00","guid":{"rendered":"https:\/\/blogs.qub.ac.uk\/dipsa\/?p=2606"},"modified":"2024-06-20T16:29:17","modified_gmt":"2024-06-20T15:29:17","slug":"dataset-announcement-ms-biographs-trillion-scale-public-real-world-sequence-similarity-graphs","status":"publish","type":"post","link":"https:\/\/blogs.qub.ac.uk\/dipsa\/dataset-announcement-ms-biographs-trillion-scale-public-real-world-sequence-similarity-graphs\/","title":{"rendered":"Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs &#8211; IISWC&#8217;23 (Poster)"},"content":{"rendered":"\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/doi.org\/10.1109\/IISWC59245.2023.00029\" target=\"_blank\"><\/a><strong><a rel=\"noreferrer noopener\" href=\"https:\/\/iiswc.org\/iiswc2023\/#\" target=\"_blank\">2023 IEEE International Symposium on Workload Characterization (IISWC\u201923)<\/a><\/strong><br>October 1-3, 2023,  Ghent, Belgium<\/p>\n\n\n\n<p class=\"has-text-align-justify\"><a rel=\"noreferrer noopener\" href=\"https:\/\/doi.org\/10.1109\/IISWC59245.2023.00029\" target=\"_blank\"><strong>DOI: 10.1109\/IISWC59245.2023.00029<\/strong><\/a><br><a href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/10\/iiswc23-poster-Authors-Copy.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">PDF Version<\/a><br><br>Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">In this paper, we announce publication of <strong>MS-BioGraphs<\/strong>, a new family of <strong>publicly-available real-world edge-weighted graph datasets with up to 2.5 trillion edges<\/strong>, that is, 6.6 times greater than the largest graph published recently.<\/p>\n\n\n\n<p class=\"has-text-align-justify\">We briefly review the two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We also study some characteristics of MS-BioGraphs.<\/p>\n\n\n\n<p>The datasets are available on <a rel=\"noreferrer noopener\" href=\"https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs\" target=\"_blank\">https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs<\/a> .<\/p>\n\n\n\n<p>Please visit <a rel=\"noreferrer noopener\" href=\"https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs-Sequence-Similarity-Graph-Datasets\/\" target=\"_blank\">https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs-Sequence-Similarity-Graph-Datasets\/<\/a> for a complete version of this paper.<\/p>\n\n\n\n<p><strong>Bibtex<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>@INPROCEEDINGS{10.1109\/IISWC59245.2023.00029,\n   author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick,  Peter and  Vigna, Sebastiano},  \n  booktitle={2023 IEEE International Symposium on Workload Characterization (IISWC'23)},  \n  title={Dataset Announcement: {MS-BioGraphs}, Trillion-Scale Public Real-World Sequence Similarity Graphs}, \n  year={2023},\n  volume={},\n  number={},\n  pages={},\n  location={Belgium, Ghent},\n  publisher={IEEE Computer Society},\n  doi={10.1109\/IISWC59245.2023.00029}\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong><a rel=\"noreferrer noopener\" href=\"https:\/\/blogs.qub.ac.uk\/DIPSA\/MS-BioGraphs\/\" target=\"_blank\">MS-BioGraphs<\/a><\/strong><br><br><strong>Related Posts<\/strong><\/p>\n\n\n<ul class=\"wp-block-latest-posts__list has-dates wp-block-latest-posts\"><li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/08\/trees-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/minimum-spanning-forest-of-ms-biographs\/\">Minimum Spanning Forest of MS-BioGraphs<\/a><time datetime=\"2024-08-09T14:11:36+01:00\" class=\"wp-block-latest-posts__post-date\">9 August 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/04\/ivy-2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-on-ieee-dataport\/\">MS-BioGraphs on IEEE DataPort<\/a><time datetime=\"2024-04-17T07:26:23+01:00\" class=\"wp-block-latest-posts__post-date\">17 April 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2024\/02\/poplar2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/paragrapher-source-code-for-webgraph-types\/\">ParaGrapher Source Code For WebGraph Types<\/a><time datetime=\"2024-02-16T08:13:13+00:00\" class=\"wp-block-latest-posts__post-date\">16 February 2024<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/11\/goldcrest-1-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/on-overcoming-hpc-challenges-of-trillion-scale-real-world-graph-datasets\/\">On Overcoming HPC Challenges of  Trillion-Scale Real-World Graph Datasets \u2013 BigData&#8217;23 (Short Paper)<\/a><time datetime=\"2023-12-15T02:47:00+00:00\" class=\"wp-block-latest-posts__post-date\">15 December 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/10-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/dataset-announcement-ms-biographs-trillion-scale-public-real-world-sequence-similarity-graphs\/\">Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs &#8211; IISWC&#8217;23 (Poster)<\/a><time datetime=\"2023-10-02T00:26:00+01:00\" class=\"wp-block-latest-posts__post-date\">2 October 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/2-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-sequence-similarity-graph-datasets\/\">MS-BioGraphs: Sequence Similarity Graph Datasets<\/a><time datetime=\"2023-08-30T06:52:00+01:00\" class=\"wp-block-latest-posts__post-date\">30 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/1-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms\/\">MS-BioGraphs MS<\/a><time datetime=\"2023-08-10T09:53:42+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/6-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa500\/\">MS-BioGraphs MSA500<\/a><time datetime=\"2023-08-10T09:52:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/3-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms200\/\">MS-BioGraphs MS200<\/a><time datetime=\"2023-08-10T09:51:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/7-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa200\/\">MS-BioGraphs MSA200<\/a><time datetime=\"2023-08-10T09:50:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/4-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms50\/\">MS-BioGraphs MS50<\/a><time datetime=\"2023-08-10T09:49:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/8-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa50\/\">MS-BioGraphs MSA50<\/a><time datetime=\"2023-08-10T09:48:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/9-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-msa10\/\">MS-BioGraphs MSA10<\/a><time datetime=\"2023-08-10T09:44:41+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/5-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-ms1\/\">MS-BioGraphs MS1<\/a><time datetime=\"2023-08-10T09:41:14+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<li><div class=\"wp-block-latest-posts__featured-image alignleft\"><img loading=\"lazy\" decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/11-150x150.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" style=\"max-width:60px;max-height:60px;\" \/><\/div><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/ms-biographs-validation\/\">MS-BioGraphs Validation<\/a><time datetime=\"2023-08-10T09:40:00+01:00\" class=\"wp-block-latest-posts__post-date\">10 August 2023<\/time><\/li>\n<\/ul>\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>2023 IEEE International Symposium on Workload Characterization (IISWC\u201923)October 1-3, 2023, Ghent, Belgium DOI: 10.1109\/IISWC59245.2023.00029PDF Version Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets. &hellip; <a href=\"https:\/\/blogs.qub.ac.uk\/dipsa\/dataset-announcement-ms-biographs-trillion-scale-public-real-world-sequence-similarity-graphs\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1315,"featured_media":2325,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[63],"tags":[116,67,35,38,64,66,65],"class_list":["post-2606","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ms-biographs","tag-biological-networks","tag-graph-datasets","tag-graph-processing","tag-high-performance-computing","tag-high-performance-graph-processing","tag-real-world-graphs","tag-sequence-similarity-graphs"],"jetpack_featured_media_url":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-content\/uploads\/sites\/14\/2023\/08\/10.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/users\/1315"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/comments?post=2606"}],"version-history":[{"count":3,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2606\/revisions"}],"predecessor-version":[{"id":2667,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/posts\/2606\/revisions\/2667"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media\/2325"}],"wp:attachment":[{"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/media?parent=2606"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/categories?post=2606"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.qub.ac.uk\/dipsa\/wp-json\/wp\/v2\/tags?post=2606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}