MS-BioGraphs on IEEE DataPort

MS-BioGraph sequence similarity graph datasets are now publicly available on IEEE DataPort: https://doi.org/10.21227/gmd9-1534.

To access the files, you need to register/login to IEEE DataPort and then visit the MS-BioGraphs page. By saving the page as an HTML file such as dp.html, you may download the datasets (as an example MS1) using the following script:

dsname="MS1"
html_file="dp.html"

urls=`cat $html_file  | sed  -e 's/\&/\&/g'  | grep -Eo "(http|https)://[a-zA-Z0-9./?&=_%:-]*" | grep amazonaws  | sort | uniq | grep -E "$dsname[-_\.]"`

for u in $urls; do
    wget $u
    if [ $? != 0 ]; then break; fi
done

# removing query strings
for f in $(find $1 -type f); do
    if [ $f = ${f%%\?*} ]; then continue; fi
    mv "${f}" "${f%%\?*}"
done

# liking offsets.bin to be found by ParaGrapher
ln -s ${dsname}_offsets.bin ${dsname}-underlying_offsets.bin

Instead of wget you may use axel -n 10 to use multiple connections (here, 10) for downloading each file (https://manpages.ubuntu.com/manpages/noble/en/man1/axel.1.html).

MS-BioGraphs

Related Posts