Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'zsync' files of databases file might be incorrect. #49

Open
NirvanaCh opened this issue May 9, 2024 · 5 comments
Open

The 'zsync' files of databases file might be incorrect. #49

NirvanaCh opened this issue May 9, 2024 · 5 comments

Comments

@NirvanaCh
Copy link

NirvanaCh commented May 9, 2024

I'm sorry for submitting an issue here.
I tried to download these databases using zsync.

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather

Pay attention to the SHA-1 checksum.

$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.scores.feather
57b58cbc57002e2b96f4b51d6a9fec0e831abd29  hg38_screen_v10_clust.regions_vs_motifs.scores.feather

$ wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
--2024-05-09 09:16:55--  https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
Resolving resources.aertslab.org (resources.aertslab.org)... 198.18.0.18
Connecting to resources.aertslab.org (resources.aertslab.org)|198.18.0.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61006451 (58M)
Saving to: ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync’

hg38_screen_v10_clust.reg 100%[===================================>]  58.18M  11.1MB/s    in 7.2s

2024-05-09 09:17:03 (8.11 MB/s) - ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync’ saved [61006451/61006451]

$ head hg38_screen_v10_clust.regions_vs_motifs.scores.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
Hash-Lengths: 2,3,6
Length: 13882267648
MTime: Thu, 07 Jul 2022 14:31:02 +0000
SHA-1: 57b58cbc57002e2b96f4b51d6a9fec0e831abd29
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
zsync: 2.0.0-alpha-1

��d��   W3�����VVGO�m��

$ wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
--2024-05-09 09:25:49--  https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
Resolving resources.aertslab.org (resources.aertslab.org)... 198.18.0.18
Connecting to resources.aertslab.org (resources.aertslab.org)|198.18.0.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97 [text/plain]
Saving to: ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt’

hg38_screen_v10_clust.reg 100%[===================================>]      97  --.-KB/s    in 0s

2024-05-09 09:25:50 (76.9 MB/s) - ‘hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt’ saved [97/97]

$ cat hg38_screen_v10_clust.regions_vs_motifs.scores.feather.sha1sum.txt
07b5e527d2ed082e081e439e68dffa77b5f6129c  hg38_screen_v10_clust.regions_vs_motifs.scores.feather

As you can see, its SHA-1 value matches the one recorded in the 'zsync' file's header, but differs from the one recorded in 'sha1sum.txt'.

I hope it's not my fault, as redownloading is a bit of a hassle.

@NirvanaCh
Copy link
Author

ranking database downloaded

$ cat hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.sha1sum.txt
1688a925f22d312769798258d990f13866bb4924  hg38_screen_v10_clust.regions_vs_motifs.rankings.feather

$ head hg38_screen_v10_clust.regions_vs_motifs.rankings.feather.zsync
Blocksize: 2048
Filename: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
Hash-Lengths: 2,3,6
Length: 35192956928
MTime: Thu, 07 Jul 2022 14:35:59 +0000
SHA-1: 95c823ee1e19f68ce0c82f79042cdc1007018ddb
URL: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
zsync: 2.0.0-alpha-1

�W�inX�H1�ƤM)�s3���␦
                    �.�t�4��eDb�D��>�P�_�����C�е�C�G�o�e����t=�r��?i����i���X{�^�O#�5�L��څq�Kr��D�!S9�ۢ�I}����w�      �{3�U^�u��3L���������D4��.>5c)�4a�B��r�ZD�C��_����˃����a�"��2#v/��[D�Z���,�

$ sha1sum hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
95c823ee1e19f68ce0c82f79042cdc1007018ddb  hg38_screen_v10_clust.regions_vs_motifs.rankings.feather

@NirvanaCh NirvanaCh changed the title The SHA-1 checksum of a databases file might be incorrect. The SHA-1 checksum files of databases file might be incorrect. May 9, 2024
@NirvanaCh
Copy link
Author

An error occurred :

ValueError: "/m/tutor/database/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather" is not a cisTarget Feather database in Feather v1 or v2 format.

ctxcore/ctdb.py :

......
def is_feather_v1_or_v2(feather_filename: Union[Path, str]) -> Optional[int]:
    """
    Check if the passed filename is a Feather v1 or v2 file.

    :param feather_filename: Feather v1 or v2 filename.
    :return: 1 (for Feather version 1), 2 (for Feather version 2) or None.
    """

    with open(feather_filename, "rb") as fh_feather:
        # Read first 6 and last 6 bytes to see if we have a Feather v2 file.
        fh_feather.seek(0, 0)
        feather_v2_magic_bytes_header = fh_feather.read(6)
        fh_feather.seek(-6, 2)
        feather_v2_magic_bytes_footer = fh_feather.read(6)

        if feather_v2_magic_bytes_header == feather_v2_magic_bytes_footer == b"ARROW1":
            # Feather v2 file.
            return 2

        # Read first 4 and last 4 bytes to see if we have a Feather v1 file.
        feather_v1_magic_bytes_header = feather_v2_magic_bytes_header[0:4]
        feather_v1_magic_bytes_footer = feather_v2_magic_bytes_footer[2:]

        if feather_v1_magic_bytes_header == feather_v1_magic_bytes_footer == b"FEA1":
            # Feather v1 file.
            return 1

    # Some other file format.
    return None
......
$ head -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1

$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
��
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
00176-

@NirvanaCh
Copy link
Author

NirvanaCh commented May 9, 2024

The file size is incorrect.

$ stat hg38_screen_v10_clust.regions_vs_motifs.*.feather
  File: hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
  Size: 35192956928     Blocks: 68736272   IO Block: 4096   regular file
Device: 807h/2055d      Inode: 18643438    Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1001/ charles)   Gid: ( 1001/ charles)
Access: 2024-05-09 10:10:29.183467890 +0800
Modify: 2022-07-07 14:35:59.000000000 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
 Birth: 2024-05-08 21:57:40.146629410 +0800
  File: hg38_screen_v10_clust.regions_vs_motifs.scores.feather
  Size: 13882267648     Blocks: 27113824   IO Block: 4096   regular file
Device: 807h/2055d      Inode: 18643440    Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1001/ charles)   Gid: ( 1001/ charles)
Access: 2024-05-09 10:48:38.146833263 +0800
Modify: 2024-05-08 23:28:39.283831255 +0800
Change: 2024-05-09 10:10:03.311709805 +0800
 Birth: 2024-05-08 21:57:43.862621727 +0800

$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:52:22 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:35:59 GMT
ETag: "831a9eca2-5e338010f31c0"
Accept-Ranges: bytes
Content-Length: 35192958114
X-Frame-Options: sameorigin

$ curl -I https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
HTTP/1.1 200 OK
Date: Thu, 09 May 2024 03:56:51 GMT
Server: Apache/2.4.29 (Ubuntu)
Strict-Transport-Security: max-age=15768000
Last-Modified: Thu, 07 Jul 2022 14:31:02 GMT
ETag: "33b729822-5e337ef5b5580"
Accept-Ranges: bytes
Content-Length: 13882267682
X-Frame-Options: sameorigin

So the ’zsync‘ files is incorrect.

@NirvanaCh NirvanaCh changed the title The SHA-1 checksum files of databases file might be incorrect. The 'zsync' files of databases file might be incorrect. May 9, 2024
@NirvanaCh
Copy link
Author

I fixed it using ‘curl -C -’

$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather** Resuming transfer from byte position 35192956928
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--100  1186  100  1186    0     0    989      0  0:00:01  0:00:01 --:--:--   989

$ curl -C - -O https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/hg38_screen_v10_clust.regions_vs_motifs.scores.feather
** Resuming transfer from byte position 13882267648
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--100    34  100    34    0     0     29      0  0:00:01  0:00:01 --:--:--    29

$ tail -c 6 hg38_screen_v10_clust.regions_vs_motifs.*feather
==> hg38_screen_v10_clust.regions_vs_motifs.rankings.feather <==
ARROW1
==> hg38_screen_v10_clust.regions_vs_motifs.scores.feather <==
ARROW1

It looks like it's working now.

To summarize,

the 'zsync' files are incorrect

Best wishes

@ghuls
Copy link
Member

ghuls commented Aug 9, 2024

zsync files are removed for now as zsync was having issues with big files (larger than 2G) for a long time.

Looks like the zsync2 bug:
AppImageCommunity/zsync2#31
might finally be resolved in a fork of zsync2: NiLuJe/zsync2@a8e2d68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants