-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The 'zsync' files of databases file might be incorrect. #49
Comments
ranking database downloaded $ cat hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.sha1sum.txt
1688a925f22d312769798258d990f13866bb4924 hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
$ head hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Hash-Lengths: 2,3,6
Length: 35192956928
MTime: Thu, 07 Jul 2022 14:35:59 +0000
SHA-1: 95c823ee1e19f68ce0c82f79042cdc1007018ddb
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
zsync: 2.0.0-alpha-1
�W�inX�H1�ƤM)�s3���␦
�.�t�4��eDb�D��>�P�_�����C�е�C�G�o�e����t=�r��?i����i���X{�^�O#�5�L��څq�Kr��D�!S9�ۢ�I}����w� �{3�U^�u��3L���������D4��.>5c)�4a�B��r�ZD�C��_����˃����a�"��2#v/��[D�Z���,�
$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
95c823ee1e19f68ce0c82f79042cdc1007018ddb hg38_screen_v10_clust.regions_vs_motifs.rankings.feather |
An error occurred :
ctxcore/ctdb.py : ......
def is_feather_v1_or_v2(feather_filename: Union[Path, str]) -> Optional[int]:
"""
Check if the passed filename is a Feather v1 or v2 file.
:param feather_filename: Feather v1 or v2 filename.
:return: 1 (for Feather version 1), 2 (for Feather version 2) or None.
"""
with open(feather_filename, "rb") as fh_feather:
# Read first 6 and last 6 bytes to see if we have a Feather v2 file.
fh_feather.seek(0, 0)
feather_v2_magic_bytes_header = fh_feather.read(6)
fh_feather.seek(-6, 2)
feather_v2_magic_bytes_footer = fh_feather.read(6)
if feather_v2_magic_bytes_header == feather_v2_magic_bytes_footer == b"ARROW1":
# Feather v2 file.
return 2
# Read first 4 and last 4 bytes to see if we have a Feather v1 file.
feather_v1_magic_bytes_header = feather_v2_magic_bytes_header[0:4]
feather_v1_magic_bytes_footer = feather_v2_magic_bytes_footer[2:]
if feather_v1_magic_bytes_header == feather_v1_magic_bytes_footer == b"FEA1":
# Feather v1 file.
return 1
# Some other file format.
return None
...... $ head -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
��
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
00176- |
The file size is incorrect. $ stat hg38_screen_v10_clust.regions_vs_motifs.*.feather
File: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Size: 35192956928 Blocks: 68736272 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643438 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:10:29.183467890 +0800
Modify: 2022-07-07 14:35:59.000000000 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:40.146629410 +0800
File: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Size: 13882267648 Blocks: 27113824 IO Block: 4096 regular file
Device: 807h/2055d Inode: 18643440 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1001/ charles) Gid: ( 1001/ charles)
Access: 2024-05-09 10:48:38.146833263 +0800
Modify: 2024-05-08 23:28:39.283831255 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
Birth: 2024-05-08 21:57:43.862621727 +0800
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:52:22 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:35:59 GMT
ETag: "831a9eca2-5e338010f31c0"
Accept-Ranges: bytes
Content-Length: 35192958114
X-Frame-Options: sameorigin
$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:56:51 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:31:02 GMT
ETag: "33b729822-5e337ef5b5580"
Accept-Ranges: bytes
Content-Length: 13882267682
X-Frame-Options: sameorigin So the ’zsync‘ files is incorrect. |
I fixed it using ‘curl -C -’ $ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather** Resuming transfer from byte position 35192956928
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:--100 1186 100 1186 0 0 989 0 0:00:01 0:00:01 --:--:-- 989
$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
** Resuming transfer from byte position 13882267648
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--100 34 100 34 0 0 29 0 0:00:01 0:00:01 --:--:-- 29
$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*feather
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1 It looks like it's working now. To summarize, the 'zsync' files are incorrectBest wishes |
zsync files are removed for now as zsync was having issues with big files (larger than 2G) for a long time. Looks like the zsync2 bug: |
I'm sorry for submitting an issue here.
I tried to download these databases using zsync.
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Pay attention to the SHA-1 checksum.
As you can see, its SHA-1 value matches the one recorded in the 'zsync' file's header, but differs from the one recorded in 'sha1sum.txt'.
I hope it's not my fault, as redownloading is a bit of a hassle.
The text was updated successfully, but these errors were encountered: