-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quota related issues #1748
Comments
check Edit:No need to check this for NA size. The problem is explained below in comments |
@ObadaS I read somewhere that minio files meta-data can expire and file_size is a metadata that may be affected if there is an expiry role. If this is true then this maybe the cause of NAN size Edit:This was not the problem. The problem is explained in the comments below |
Interesting, I will check that tomorrow. Codalab.lisn.upsaclay.fr also shows dataset size and does not seem to be impacted by the bug, even though they are both connected to the same MinIO, which could point to either a bucket problem on MinIO or a problem in the code of the plateform |
The only problem right now is reproducibility. If we can reproduce the bug locally then solving it will not be a problem. I am pretty much convinced that there is no problem in the code because we see sizes for some files but I will investigate the code a bit more to be sure |
We found that the file sizes are emptied by |
I started the storage analytics on production. What starts the |
I confirmed using this code that accessing file size from minio takes longer than accessing the file_size from DB. We can run it on codabench-test to be sure. Accessing from DBimport time
from datasets.models import Data
start_time = time.time()
sizes = sum(item.file_size for item in Data.objects.all())
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution Time: {execution_time:.4f} seconds") Accessing from Minioimport time
from datasets.models import Data
start_time = time.time()
sizes = sum(item.data_file.size for item in Data.objects.all())
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution Time: {execution_time:.4f} seconds") |
We need to separate the computing of the file sizes from the analytics task: codabench/src/apps/analytics/tasks.py Line 37 in 44c7d7f
# Measure all files with unset size
for dataset in Data.objects.filter(Q(file_size__isnull=True) | Q(file_size__lt=0)):
try:
dataset.file_size = Decimal(
dataset.data_file.size / 1024
) # file_size is in KiB
except Exception:
dataset.file_size = Decimal(-1)
finally:
dataset.save() |
New version of the script to compute file sizes: from datasets.models import Data
from decimal import Decimal
print("Total objects: ", Data.objects.all().count())
datasets = Data.objects.all().order_by("id")
print("Processing now: ", datasets.count())
for dataset in datasets:
if dataset.data_file and hasattr(dataset.data_file, 'size'):
try:
file_size = dataset.data_file.size
if file_size <= 0 or file_size is None:
file_size = Decimal(0)
else:
file_size = Decimal(file_size) / 1024
dataset.file_size = file_size
dataset.save()
except Exception as e:
print(f"Skipping dataset ID {dataset.id}: Invalid file size ({dataset.data_file.size}) - Error: {e}")
else:
print(f"File size problem, Data ID {dataset.id}") |
The file sizes are back on production. |
Some more files with code to fix sizes submission_detail.txt |
List of data files that failed: |
TODOS:
Reset User Quota from bytes to GB
Solved by User quota is updated to GB from Bytes #1749
Find the use of GiB/MiB/KiB in storage analytics and other places and replace with GB/MB/KB
Solved by: File Sizes cleanup #1752
Size of files is formatted at different places with different functions, use only one size formatter everywhere
Solved by: File Sizes cleanup #1752
Check why the submission size in used quota doubles when submission finishes
Maybe because of submission file saved in the prediction output
Find and fix the NAN file size issue
For more details check the comments section of this PR: #1738
The text was updated successfully, but these errors were encountered: