Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not show zero total file size for globus objects #3230

Open
peetucket opened this issue Jul 11, 2023 · 4 comments · May be fixed by #3256
Open

Do not show zero total file size for globus objects #3230

peetucket opened this issue Jul 11, 2023 · 4 comments · May be fixed by #3256
Assignees
Labels
PO Issues being tracked by the product owner

Comments

@peetucket
Copy link
Member

peetucket commented Jul 11, 2023

For objects accessioned via globus and for which individual file sizes (either some or all in an object) are not available, we do not want to show 0 in the collection item list table. Instead we will show a message (TBD by Astrid, something like "not available*"), and then at the bottom of the page, we will have a legend indicating why it's not available (TBD something like "* = items deposited through Globus do not have file sizes computed")

Note it may be possible (based on #3231) for some files to have no file size and some files to have a file size. In this case, we still do no want to show the total object size, because it will not be accurate. So the determination of whether to show the total file size or not is not "0" but maybe if globus was used to upload files at any point (again, based on what we learn in #3231)

Note: do the investigation in #3231 first, because the results of that will determine how we implement this

@peetucket peetucket added 2023 Summer 2023 workcycle design needed Issues requiring design input labels Jul 11, 2023
@astridu
Copy link

astridu commented Jul 11, 2023

Place "NA*" in the size column, and then place a footnote at the bottom of the table:
"*File sizes are not available for deposits made using Globus."
image

@peetucket peetucket removed the design needed Issues requiring design input label Jul 12, 2023
@edsu
Copy link
Contributor

edsu commented Jul 14, 2023

While we could put NA we actually do know the sizes so I would be in favor of fixing the problem. I believe that globus_client needs a new or adjusted method for returning files in a way that preserves the filename and file size. Then H2 needs to be modified to use it to persist the file size in addition to the file name.

@edsu edsu self-assigned this Jul 17, 2023
@edsu
Copy link
Contributor

edsu commented Jul 17, 2023

With what we learned in #3231 I think we need to:

  • Modify globus_client to return file sizes as well as file names
  • Modify h2 to persist the file size when fetching files from Globus
  • Write a program to populate the missing file sizes

@edsu
Copy link
Contributor

edsu commented Jul 17, 2023

It turns out globus_client already has a GlobusClient.list_files() that works nicely:

3.2.0 :011 > GlobusClient.list_files(user_id: '[email protected]', path: 'katebar/work19057')
 => [#<struct GlobusClient::Endpoint::FileInfo name="/uploads/katebar/work19057/version1/bh979th0089.zip", size=4695>]

edsu added a commit that referenced this issue Jul 17, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Fixes #3230
@edsu edsu linked a pull request Jul 17, 2023 that will close this issue
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Fixes #3230
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Fixes #3230
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Fixes #3230
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Fixes #3230
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Also add a cleanup:file_sizes rake task for updating the 0 file sizes
using the size stored in SDR.

Fixes #3230
edsu added a commit that referenced this issue Jul 18, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Also add a cleanup:file_sizes rake task for updating the 0 file sizes
using the size stored in SDR.

Fixes #3230
edsu added a commit that referenced this issue Jul 19, 2023
Modify the FetchGlobusJob to use GlobusClient.list_files instead of
GlobusClient.get_filenames so that it can get access to the file sizes
as well as the file names.

GlobusService.download_chunk needed to be defined as a no-op or else the
call to attach the blob throws a NotImplementedError when it tries to
identify the content type of a blob with a non-zero size.

Also add a cleanup:file_sizes rake task for updating the 0 file sizes
using the size stored in SDR.

Fixes #3230
@amyehodge amyehodge added PO Issues being tracked by the product owner and removed 2023 Summer 2023 workcycle labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PO Issues being tracked by the product owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants