-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CountItems to consider MPUs in storage metrics #333
base: development/1.14
Are you sure you want to change the base?
Conversation
- The MPU parts size are part of the current size of the bucket - Only the count of overview keys are part of the object count - The metrics are detailed in a field - The getObjectMDStats function is updated: the logic to process each cursor's entry is shared, and mpu entries are processed in the same way as the regular objects, with some specifics. Issue: S3UTILS-186
Hello williamlardier,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
utils/S3UtilsMongoClient.js
Outdated
return callback(err); | ||
} | ||
const retResult = this._handleResults(collRes, isVer); | ||
retResult.stalled = stalledCount; | ||
return callback(null, retResult); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple calls to callback
: after processing both the cursor and mpu cursor, and once more eventually after the inflight processing...
....which may actually not really be the issue: looking at the documentation of mongodb driver, I don't see any mention of a second callback: so I am thinking this may be a left-over from the upgrade to promises, and this callback should just be removed (errors will raise an exception, caught eventually; and handleResults
is called for normal case at line 434)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the err callbacks, they are not used with mongodb driver v5 indeed, so no impact
collRes.account[account].locations[location][targetCount]++; | ||
collRes.account[account].locations[location].deleteMarkerCount += res.value.isDeleteMarker ? 1 : 0; | ||
collRes[metricLevel][resourceName][targetData] += data[metricLevel][resourceName]; | ||
// Do not count the MPU parts as objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an issue in this PR, but worth considering for the future: eventually, do we want to report MPU by number of uploads (i.e. "potential objects") or actual parts (which are stored internally as separate documents, and actually each reference some data)...
thinking about this,
- it may actually be better to count each part as an object : so we can also report on left-over parts even if the overview key is missing. Semantics may not be so good (part vs object), but I'd rather report something that more closely matches the storage than the user's business logic for now, and thus not mask any issue.
- as far as semantics go,
mpuPartsCount
should be a number of parts: if we count only the overview keys, should be something likempuUploadsCount
instead (here and in other field names)
A compromise may be to count (and store) partsCount
, uploadsCount
and partsSize
, but only aggregate uploadsCount
in object count and partsSize
in object size. But not sure it is worth the extra effort: may be best to just keep it "simple", counting and measuring individual parts and reporting them like objects...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an issue in this PR, but worth considering for the future
I formally disagree that more effort should be put into this obsolete script. But it's worth discussing how to count MPUs for Scuba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's worth discussing how to count MPUs for Scuba.
that is my point: this discussion is not about the component where it is implemented (utapi, s3utils, scuba, ...) but really about the semantics and data we want to measure.
tests/unit/CountItems/utils/utils.js
Outdated
@@ -23,6 +24,7 @@ describe('CountItems::utils::consolidateDataMetrics', () => { | |||
_currentRestoring: 0, | |||
_nonCurrentRestored: 0, | |||
_nonCurrentRestoring: 0, | |||
_incompleteMPUParts: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as implemented today (see my other comment), this is not the count of parts
but the count of uploads
, so the field should be named _incompleteMPUUploads
. Or may be better instead to actually count parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counting parts does not make sense from a client point of view, as they cannot control, most of the time, how the MPUs are splited. What is important is knowing how many MPUs are incomplete and how much data is occupied by these MPUs. It's anyway not important here, because our only use case are the quotas (only on storage bytes) and reflecting the current usage in the UI (no count of objects here as well). We can however consider it for Scuba, in a separate work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed above, i seems presomptuous to say what makes or does not make sense for a client point of view: anyway the APIs are limited, it is not presented to the user...
the request from product (as a proxy of customers) today is really just to count the "size" used by incomplete MPU (parts or uploads is the same here). As for the number of parts or uploads, it actually fits different uses, both for the "customers" (but different personas):
- number of parts may makes more sense for an admin, which wants to understand why mongo is overloaded;
- number of uploads may make more sense for the user which performed the upload, which wants to understand how many extra uploads he did... but maybe he does not care so much about number of uploads, and more about the amount of data he has to re-upload...
i also don't know what is expected, and it will need to be considered for Scuba : but as soon as you cristalize a behavior here and it gets shipped, customers may start to use it, and we will have to support it... So we must really refrain from adding something quickly just because we can, esp. if the semantics may be ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that in the S3 world we would typically use ListMultipartUploads to get the number of in-progress/incomplete MPUs: this is not returning all the part. If we need the parts we can use ListParts.
S3 doesn't expose the number of parts either in their metrics, but:
- Incomplete Multipart Upload Storage Bytes – The total bytes in scope with incomplete multipart uploads
- Incomplete Multipart Upload Object Count – The number of objects in scope that are incomplete multipart uploads
So for sure the number of upload will be required for us to be standard. We cannot reflect more storage utilization if we have 0 object reported (for example, reporting 1TB of data with 0 object if everything is only incomplete MPUs): there needs to be a correlation between the two. And the number of upload is the natural (and standard) information to have.
Then maybe we can consider the number of parts, not supporting them here is actually what I suggested and what you seem to align on: if we ship it we'll need it in SUR, yet it may not be needed, or hard to track within scuba...
Also remove unused callback in the mongodb foreach Issue: S3UTILS-186
current
metric for the associated bucketThe logic to process the object was kept inline in the function as the number of accessed variables is high, to avoid unnecessary complexity: the function is already unit-tested and hopefully dropped soon or later...
Issue: S3UTILS-186