MS-BioGraphs: Sequence Similarity Graph Datasets

DOI: 10.48550/arXiv.2308.16744

PDF Version
arXiv Link

Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets.

To ensure continuation of this progress, we (i) investigate and optimize the process of generating large sequence similarity graphs as an HPC challenge and (ii) demonstrate this process in creating MS-BioGraphs, a new family of publicly available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently. The largest graph is created by matching (i.e., all-to-all similarity aligning) 1.7 billion protein sequences. The MS-BioGraphs family includes also seven subgraphs with different sizes and direction types.

We describe two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We present a comparative study of structural characteristics of MS-BioGraphs.

The datasets are available online on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs .

BibTex

@article{MS-BioGraphs-arxiv,
    title = {{MS-BioGraphs}: Sequence Similarity Graph Datasets},
    author = {Koohi Esfahani, Mohsen and Boldi, Paolo and Vandierendonck, Hans and Kilpatrick, Peter and Vigna, Sebastiano},
    year = 2023,
    journal = {CoRR},
    volume = {abs/2308.16744},
    doi = {10.48550/arXiv.2308.16744},
    url = {https://doi.org/10.48550/arXiv.2308.16744},
    archiveprefix = {arXiv},
    eprint = {2308.16744}
}

MS-BioGraphs

Related Posts

MS-BioGraphs MS

NameMS-BioGraphs – MS
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedNo
Number of Vertices1,757,323,526
Number of Edges2,488,069,027,875
Maximum Degree814,957
Minimum Weight98
Maximum Weight634,925
Number of Zero-Degree Vertices6,437,984
Average Degree1,415.8
Size of The Largest WCC2,486,890,448,664
Number of WCC148,861,367
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820808
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MS-underlying.graph, Size: 7,342,853,446,646 Bytes
  • File: MS-underlying.offsets, Size: 5,341,385,503 Bytes
  • File: MS-underlying.properties, Size: 1,560 Bytes
Total Size: 7,348,194,833,709 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MS-weights.labels, Size: 5,037,171,681,279 Bytes
  • File: MS-weights.labeloffsets, Size: 5,070,752,590 Bytes
  • File: MS-weights.properties, Size: 183 Bytes
Total Size: 5,042,242,434,052 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MS_edges_shas.txt
  • Size: 4,449,360 Bytes
  • SHASUM: 85d5b0896f8fa8a2b490ec6560937c45ced8b0d9
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MS_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 15c3defdbb92f7b1fe48a3fb20530d99fa30c616
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MS-wcc.bin
  • Size: 7,029,294,104 Bytes
  • SHASUM: 30f12b738dde8f62aecb94239796b169512e6710
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MS.ojson
  • Size: 700 Bytes
  • SHASUM: e2eb3fcdd0c22838971ed2edea8e1ed081a77282


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MSA500

NameMS-BioGraphs – MSA500
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MSA500
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedYes
Number of Vertices1,757,323,526
Number of Edges1,244,904,754,157
Maximum In-Degree229,442
Maximum Out-Degree814,461
Minimum Weight98
Maximum Weight634,925
Number of Zero In-Degree Vertices6,437,984
Number of Zero Out-Degree Vertices16,843,087
Average In-Degree711.0
Average Out-Degree715.3
Size of The Largest Weakly Connected Component1,244,203,865,823
Number of Weakly Connected Components148,861,367
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820810
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MSA500-underlying.graph, Size: 3,755,604,574,487 Bytes
  • File: MSA500-underlying.offsets, Size: 4,811,273,232 Bytes
  • File: MSA500-underlying.properties, Size: 1,537 Bytes
Total Size: 3,760,415,849,256 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MSA500-weights.labels, Size: 2,520,671,185,509 Bytes
  • File: MSA500-weights.labeloffsets, Size: 4,554,987,345 Bytes
  • File: MSA500-weights.properties, Size: 187 Bytes
Total Size: 2,525,226,173,041 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MSA500_edges_shas.txt
  • Size: 2,226,360 Bytes
  • SHASUM: d9f692b6f4770f282ea62936293baf6a649c2b91
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MSA500_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 3eab31d99426ed9f96af6b258fd1253544ba5461
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MSA500-wcc.bin
  • Size: 7,029,294,104 Bytes
  • SHASUM: 30f12b738dde8f62aecb94239796b169512e6710
Transposed’s Offsets (Binary) The offsets array of the transposed graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements. The first and last values are 0 and |E|, respectively.
It helps to transpose the graph by performing one pass over edges.
  • Name: MSA500_trans_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 220a2a5c60baaedc8913720862b535ba6cabb5bd
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MSA500.ojson
  • Size: 902 Bytes
  • SHASUM: 5eaebdff2dc56925a0b4751f579ebeabb6e3bee5


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

In-Degree Distribution
Out-Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Push and Pull Locality
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MS200

NameMS-BioGraphs – MS200
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS200
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedNo
Number of Vertices1,414,493,449
Number of Edges502,930,788,612
Maximum Degree745,735
Minimum Weight460
Maximum Weight634,925
Number of Zero-Degree Vertices0
Average Degree355.6
Size of The Largest WCC485,867,547,569
Number of WCC338,348,495
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820812
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MS200-underlying.graph, Size: 1,459,981,767,426 Bytes
  • File: MS200-underlying.offsets, Size: 3,174,012,489 Bytes
  • File: MS200-underlying.properties, Size: 1,515 Bytes
Total Size: 1,463,155,781,430 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MS200-weights.labels, Size: 1,199,053,831,206 Bytes
  • File: MS200-weights.labeloffsets, Size: 3,090,041,102 Bytes
  • File: MS200-weights.properties, Size: 186 Bytes
Total Size: 1,202,143,872,494 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MS200_edges_shas.txt
  • Size: 899,640 Bytes
  • SHASUM: 5bb635fc94aea3ee7b2b6a4aecbbb1fc6f77e1b5
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MS200_offsets.bin
  • Size: 11,315,947,600 Bytes
  • SHASUM: 9192158aab65e1ca536a46183411d87452cd9ee3
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MS200-wcc.bin
  • Size: 5,657,973,796 Bytes
  • SHASUM: 027e1b826659b5ec0f62921a4eb3ecd6c83fa76a
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
N2O Reordering (Binary) The New to Old (N2O) reordering array of the graph in binary format and little endian order.
It consists of |V| 4-Bytes elements and identifies the old ID of each vertex which is used in searching the name of vertex (protein) in the names.tar.gz file .
  • Name: MS200-n2o.bin
  • Size: 5,657,973,796 Bytes
  • SHASUM: de833f1c36011af07c165f53760b82a49715537d
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MS200.ojson
  • Size: 757 Bytes
  • SHASUM: 540c0bded9ab8d334574ed7dd7909435b617ecf3


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MSA200

NameMS-BioGraphs – MSA200
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MSA200
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedYes
Number of Vertices1,757,323,526
Number of Edges500,444,322,597
Maximum In-Degree658,879
Maximum Out-Degree709,176
Minimum Weight98
Maximum Weight634,925
Number of Zero In-Degree Vertices6,437,984
Number of Zero Out-Degree Vertices7,471,315
Average In-Degree285.8
Average Out-Degree286.0
Size of The Largest Weakly Connected Component496,880,685,957
Number of Weakly Connected Components221,467,156
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820815
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MSA200-underlying.graph, Size: 1,558,147,532,780 Bytes
  • File: MSA200-underlying.offsets, Size: 4,319,801,854 Bytes
  • File: MSA200-underlying.properties, Size: 1,517 Bytes
Total Size: 1,562,467,336,151 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MSA200-weights.labels, Size: 1,105,784,580,128 Bytes
  • File: MSA200-weights.labeloffsets, Size: 4,123,546,304 Bytes
  • File: MSA200-weights.properties, Size: 187 Bytes
Total Size: 1,109,908,126,619 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MSA200_edges_shas.txt
  • Size: 895,200 Bytes
  • SHASUM: de1ac0ddce536168881ca2e49e6d5f0cf5b82bb5
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MSA200_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: c241d2dc4bdf46f60c1cd889ac367504d3f58805
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MSA200-wcc.bin
  • Size: 7,029,294,104 Bytes
  • SHASUM: 2cb256d5e49e5dd0989715cb909fd8f27bfbd04c
Transposed’s Offsets (Binary) The offsets array of the transposed graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements. The first and last values are 0 and |E|, respectively.
It helps to transpose the graph by performing one pass over edges.
  • Name: MSA200_trans_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 47787ac64fb4485da02e3bcdc1696a814adfdb86
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MSA200.ojson
  • Size: 897 Bytes
  • SHASUM: 18e371cbb4bd9dbe6515e4528956ff32fb2e30c4


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

In-Degree Distribution
Out-Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Push and Pull Locality
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MS50

NameMS-BioGraphs – MS50
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS50
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedNo
Number of Vertices585,603,088
Number of Edges124,783,559,600
Maximum Degree507,826
Minimum Weight900
Maximum Weight634,925
Number of Zero-Degree Vertices0
Average Degree213.1
Size of The Largest WCC102,256,631,195
Weight of Minimum Spanning Forest (ignoring self-edges)416,318,200,808
click for details
Number of WCC155,295,301
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820819
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MS50-underlying.graph, Size: 347,621,279,586 Bytes
  • File: MS50-underlying.offsets, Size: 1,235,232,971 Bytes
  • File: MS50-underlying.properties, Size: 1,459 Bytes
Total Size: 348,856,514,016 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MS50-weights.labels, Size: 324,269,690,037 Bytes
  • File: MS50-weights.labeloffsets, Size: 1,221,399,047 Bytes
  • File: MS50-weights.properties, Size: 185 Bytes
Total Size: 325,491,089,269 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MS50_edges_shas.txt
  • Size: 223,440 Bytes
  • SHASUM: 5d1bc449124448e9a6ed3bd439942e31f55d9f97
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MS50_offsets.bin
  • Size: 4,684,824,712 Bytes
  • SHASUM: b298f974167a1c64a8ba8e211a970c5b5d427137
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MS50-wcc.bin
  • Size: 2,342,412,352 Bytes
  • SHASUM: 4d640ce445477191a3bc3dd00f09f712b9429af2
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
N2O Reordering (Binary) The New to Old (N2O) reordering array of the graph in binary format and little endian order.
It consists of |V| 4-Bytes elements and identifies the old ID of each vertex which is used in searching the name of vertex (protein) in the names.tar.gz file .
  • Name: MS50-n2o.bin
  • Size: 2,342,412,352 Bytes
  • SHASUM: 91939605bdde3eb67a013f80d4c2a84d1684ca8f
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MS50.ojson
  • Size: 751 Bytes
  • SHASUM: eb94812bea81cd40a3f33d6aaa5fdd63946ffc92


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MSA50

NameMS-BioGraphs – MSA50
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MSA50
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedYes
Number of Vertices1,757,323,526
Number of Edges125,312,536,732
Maximum In-Degree543,117
Maximum Out-Degree297,981
Minimum Weight98
Maximum Weight634,925
Number of Zero In-Degree Vertices6,437,984
Number of Zero Out-Degree Vertices8,542,018
Average In-Degree71.6
Average Out-Degree71.7
Size of The Largest Weakly Connected Component117,980,151,055
Number of Weakly Connected Components363,090,851
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820821
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MSA50-underlying.graph, Size: 410,094,612,576 Bytes
  • File: MSA50-underlying.offsets, Size: 3,504,554,221 Bytes
  • File: MSA50-underlying.properties, Size: 1,493 Bytes
Total Size: 413,599,168,290 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MSA50-weights.labels, Size: 284,756,409,010 Bytes
  • File: MSA50-weights.labeloffsets, Size: 3,374,946,996 Bytes
  • File: MSA50-weights.properties, Size: 186 Bytes
Total Size: 288,131,356,192 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MSA50_edges_shas.txt
  • Size: 224,400 Bytes
  • SHASUM: 6f56a6710ef6b6e7c01e90907f19c7a0099a272c
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MSA50_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 3272fb9c681648598f18ab5a10bbafb5bf48dca5
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MSA50-wcc.bin
  • Size: 7,029,294,104 Bytes
  • SHASUM: 82e3ba326bb56c69edbe7fbb90ce70b731e3a7f2
Transposed’s Offsets (Binary) The offsets array of the transposed graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements. The first and last values are 0 and |E|, respectively.
It helps to transpose the graph by performing one pass over edges.
  • Name: MSA50_trans_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 812d75359683dd235a1bd948566b306f43e7088d
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MSA50.ojson
  • Size: 892 Bytes
  • SHASUM: 5767cdd2e0cddba1ba255afe9accfdbe5d5aabd2


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

In-Degree Distribution
Out-Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Push and Pull Locality
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MSA10

NameMS-BioGraphs – MSA10
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MSA10
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedYes
Number of Vertices1,757,323,526
Number of Edges25,236,632,682
Maximum In-Degree207,279
Maximum Out-Degree62,060
Minimum Weight98
Maximum Weight634,925
Number of Zero In-Degree Vertices6,437,984
Number of Zero Out-Degree Vertices9,926,249
Average In-Degree14.4
Average Out-Degree14.4
Size of The Largest Weakly Connected Component15,576,385,764
Number of Weakly Connected Components628,505,933
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820823
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MSA10-underlying.graph, Size: 87,421,101,649 Bytes
  • File: MSA10-underlying.offsets, Size: 2,743,422,804 Bytes
  • File: MSA10-underlying.properties, Size: 1,439 Bytes
Total Size: 90,164,525,892 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MSA10-weights.labels, Size: 58,798,062,287 Bytes
  • File: MSA10-weights.labeloffsets, Size: 2,731,563,328 Bytes
  • File: MSA10-weights.properties, Size: 186 Bytes
Total Size: 61,529,625,801 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MSA10_edges_shas.txt
  • Size: 45,480 Bytes
  • SHASUM: 9c42e8ba057c519ae318071e63ab3ffdf992cd50
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MSA10_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: b42a8f6aee7c0abdd715f523238ea59acb09c24b
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MSA10-wcc.bin
  • Size: 7,029,294,104 Bytes
  • SHASUM: 37f30d638341fa50ae9c73893e7cab689ef14be8
Transposed’s Offsets (Binary) The offsets array of the transposed graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements. The first and last values are 0 and |E|, respectively.
It helps to transpose the graph by performing one pass over edges.
  • Name: MSA10_trans_offsets.bin
  • Size: 14,058,588,216 Bytes
  • SHASUM: 2ae765f6f79b8f41221ba0d869648d01d19bcadd
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MSA10.ojson
  • Size: 885 Bytes
  • SHASUM: 0d8c48f9297d36a628aabcd8576cb0c083607534


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

In-Degree Distribution
Out-Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Push and Pull Locality
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs MS1

NameMS-BioGraphs – MS1
URLhttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-MS1
Download Linkhttps://doi.org/10.21227/gmd9-1534
Script for Downloading All Fileshttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-on-IEEE-DataPort/
Validating and Sample Codehttps://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation/
Graph ExplanationVertices represent proteins and each edge represents the sequence similarity between its two endpoints
Edge WeightedYes
DirectedNo
Number of Vertices43,144,218
Number of Edges2,660,495,200
Maximum Degree14,212
Minimum Weight3,680
Maximum Weight634,925
Number of Zero-Degree Vertices0
Average Degree61.7
Size of The Largest WCC124,003,393
Number of WCC15,746,208
Weight of Minimum Spanning Forest (ignoring self-edges)109,915,787,546
click for details
Creation DetailsMS-BioGraphs: Sequency Similarity Graph Datasets
FormatWebGraph
LicenseCC BY-NC-SA
QUB IDF2223-052
DOI10.5281/zenodo.7820827
Citation
Mohsen Koohi Esfahani, Sebastiano Vigna, 
Paolo Boldi, Hans Vandierendonck, Peter Kilpatrick, March 13, 2024, 
"MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets", 
IEEE Dataport, doi: https://doi.org/10.21227/gmd9-1534.
Bibtex
@data{gmd9-1534-24,
doi = {10.21227/gmd9-1534},
url = {https://doi.org/10.21227/gmd9-1534},
author = {Koohi Esfahani, Mohsen and Vigna, Sebastiano and Boldi, 
Paolo and Vandierendonck, Hans and Kilpatrick, Peter},
publisher = {IEEE Dataport},
title = {MS-BioGraphs: Trillion-Scale Sequence Similarity Graph Datasets},
year = {2024} }


Files

Underlying Graph The underlying graph in WebGraph format:
  • File: MS1-underlying.graph, Size: 6,300,911,484 Bytes
  • File: MS1-underlying.offsets, Size: 77,574,569 Bytes
  • File: MS1-underlying.properties, Size: 1,288 Bytes
Total Size: 6,378,487,341 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Weights (Labels) The weights of the graph in WebGraph format:
  • File: MS1-weights.labels, Size: 8,201,441,365 Bytes
  • File: MS1-weights.labeloffsets, Size: 80,797,007 Bytes
  • File: MS1-weights.properties, Size: 184 Bytes
Total Size: 8,282,238,556 Bytes
These files are validated using ‘Edge Blocks SHAs File’ as follows.
Edge Blocks SHAs File (Text) This file contains the shasums of edge blocks where each block contains 64 Million continuous edges and has one shasum for its 64M endpoints and one for its 64M edge weights.
The file is used to validate the underlying graph and the weights. For further explanation about validation process, please visit the https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs-Validation.
  • Name: MS1_edges_shas.txt
  • Size: 5,040 Bytes
  • SHASUM: 27974edb4bf8f3b17b00ff3a72a703da18f3807a
Offsets (Binary) The offsets array of the CSX (Compressed Sparse Rows/Columns) graph in binary format and little endian order. It consists of |V|+1 8-Bytes elements.
The first and last values are 0 and |E|, respectively.
This array helps converting the graph (or parts of it) from WebGraph format to binary format by one pass over (related) edges.
  • Name: MS1_offsets.bin
  • Size: 345,153,752 Bytes
  • SHASUM: 0abedde32e1ac7181897f82d10d40acfe14f2022
WCC (Binary) The Weakly-Connected Compontent (WCC) array in binary format and little endian order.
This array consists of |V| 4-Bytes elements The vertices in the same component have the same values in the WCC array.
  • Name: MS1-wcc.bin
  • Size: 172,576,872 Bytes
  • SHASUM: 4c491dd96e3582b70a203ae4a910001381278d75
Names (tar.gz) This compressed file contains 120 files in CSV format using ‘;’ as the separator. Each row has two columns: ID of vertex and name of the sequence.
Note: If the graph has a ‘N2O Reordering’ file, the n2o array should be used to convert the vertex ID to old vertex ID which is used for identifying name of the protein in the `names.tar.gz` file.
  • Name: names.tar.gz
  • Size: 27,130,045,933 Bytes
  • SHASUM: ba00b58bbb2795445554058a681b573c751ef315
N2O Reordering (Binary) The New to Old (N2O) reordering array of the graph in binary format and little endian order.
It consists of |V| 4-Bytes elements and identifies the old ID of each vertex which is used in searching the name of vertex (protein) in the names.tar.gz file .
  • Name: MS1-n2o.bin
  • Size: 172,576,872 Bytes
  • SHASUM: b163320b6349fed7a00fb17c4a4a22e7d124b716
OJSON The charactersitics of the graph and shasums of the files.
It is in the open json format and needs a closing brace (}) to be appended before being passed to a json parser.
  • Name: MS1.ojson
  • Size: 736 Bytes
  • SHASUM: c60afa0652955fd46f1bb8056380523504d69fa6


Plots

For the explanation about the plots, please refer to the MS-BioGraphs paper.
To have a better resolution, please click on the images.

Degree Distribution
Weight Distribution
Vertex-Relative Weight Distribution
Degree Decomposition
Cell-Binned Average Weight Degree Distribution
Weakly-Connected Components Size Distribution


MS-BioGraphs


Related Posts

MS-BioGraphs Validation

Repository

https://github.com/DIPSA-QUB/MS-BioGraphs-Validation

Explanation

We provide a Shell script, validation.sh, and a Java program, EdgeBlockSHA.java, to verify the the correctness of the graphs. Each graph has a .ojson file whose shasum is verified by the value retreived from our server. Files such as offsets.bin, wcc.bin, n2o.bin, trans_offsets.bin, and edges_shas.txt have shasum records in the ojson file which is used for validation of these files.

The graph in WebGraph format has been compressed in MS??-underlying.* and MS??-weights.* files. In order to validate the compressed graph, the EdgeBlockSHA.java is used. It is a parallel Java code that uses the WebGraph library to traverse the graph and calculate the shasum of blocks of edges (endpoints and weights). Then, the calculated results are matched with the edges_shas.txt file of the graph.

It is also possible to validate some particular blocks by matching the calculated shasum with the relevant row in the edges_shas.txt file. This file has a format such as the following. Each block contains 64 Million consecutive edges. The start of each block is identified by a vertex ID and its edge index. The Column endpoint_sha is the shasum of the 64 Million endpoints when stored as an array of 4-Bytes elements in the binary format and in the little endian order. Similarly, Column weights_sha shows the shasum of weights (labels). We have separated weights from endpoints as in some applications weights are not needed and therefore it is not necessary to read and validate them.

64MB blk#;     vertex; edge index;                             endpoint_sha;                              weights_sha;
         0;          0;          0; 509784b158cb9404241afb21d0ceaf590b88d2f2; 57da4ad7bb89c5922e436b0535d791fa8f40dffd;
         1;    2315113;        705; fafc118563c1d7b5fbff64af56edd6a56524f479; 13b7a9ca60bfb0715d563218d0a1cd787b00a07c;
         2;    4521625;        597; 4ed65aa07c8062a151166ef2e9bdb93e41d19357; 8158276bec426ee46eca9912759eb9bd57fcc957;
         3;    6347361;        112; d02e8913c807c3f4ecde9c638e0ded5ab80ba819; 26bc3296de65cba6ac539cd96b79ae6f7a4d37be;
         4;    8447869;         15; 61513c84db40124496cdf769516118b63598914f; 781b9f4372ac614e94d097017c756d015234deb6; 
 

Requirements

  • JDK with version > 15
  • jq
  • wget

WebGraph Framework

Please visit https://webgraph.di.unimi.it .

ParaGrapher Graph Loading API and Library

The WebGraph formats can also be read using the ParaGrapher library: https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.

License

Licensed under the GNU v3 General Public License, as published by the Free Software Foundation. You must not use this Software except in compliance with the terms of the License. Unless required by applicable law or agreed upon in writing, this Software is distributed on an “as is” basis, without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose, neither express nor implied.

Copyright 2022-2023 The Queen’s University of Belfast, Northern Ireland, UK

MS-BioGraphs

Related Posts