-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile RucioConMon memory #12089
base: master
Are you sure you want to change the base?
Profile RucioConMon memory #12089
Conversation
Jenkins results:
|
fix parameters
8712888
to
f7ec44e
Compare
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
I finally managed to do memory measurements with the code currently provided in this PR. It compares the current RucioConMon implementation in the WMCore stack in 2 scenarios:
I ran these tests in vocms0259, such that I could measure memory usage in the node with the grafana host monitor. See screenshot below: Some observations are:
As I was running format=raw (also faster!)
format=json:
I will make sure these changes are reflected in #12059 and proceed with this development over there. |
Alan, it seems to me that actual issue in accumulation of results in this for loop: |
The function
is in the correct place, as here will be the place to parse each lfn and decide what to do with them (on the client side). |
Alan, the issue is in RucioConMon and Service modules. Here is my insight into its behavior:
My proposal is to stream data from From my observation the current implementation of fetching data has unavoidable memory footprint due to loading data into python after HTTP call, and the larger HTTP response the larger memory footprint the code will deal with. |
Valentin, you seem to have captured well the flow of an HTTP call through the To add on what you described above, I think the actual data is loaded into memory at the most base class (pycurl_manager), at these lines: which can then be automatically decompressed as well, if content is in To really minimize the memory footprint, we would have to stream data from server to client, fetching 1 row each time. I don't know the exact details, but I guess the server would have to support a new-line (or similar) data streaming, the connection between client and server would have to remain opened until the client exhausts all the data in the response object. This data-streaming is somehow a conflicting idea with the custom data caching we have implemented in the |
What you are looking for is NDJSON data-format which server must implement, basically it is list of JSON records separated by '\n'. Doing this way client can request such data (it can be in zipped format as well) and read one record at a time, similar to how CSV data-format is processed. And, the total amount of memory required for entire set of records will be reduced to a size of a single record. |
Can one of the admins verify this patch? |
Fixes #<GH_Issue_Number>
Status
<In development | not-tested | on hold | ready>
Description
Is it backward compatible (if not, which system it affects?)
<YES | NO | MAYBE>
Related PRs
<If it's a follow up work; or porting a fix from a different branch, please mention them here.>
External dependencies / deployment changes
<Does it require deployment changes? Does it rely on third-party libraries?>