-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rpmlogOnce() and rpmlogReset() #3417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions:
- We already know what messages (code & string) have been emitted in the log context as they're stored in the
recs
member (currently a vector). Is there any benefit of adding a newseen
map, instead of turningrecs
into such a map, adding a counter torpmlogRec_s
and using that? - Is there any use case for keying on an additional string rather than on the actual log message?
- Once we support non-global logging contexts, the
domain
would no longer be needed. That is, all users/domains (e.g.ts
) would get a separate log context, instead of tracking them centrally in the global context. Is that correct?
I see that this is meant to be a PoC kind of thing, and as such it looks fine (some comments inline).
@pmatilai, any other opinion?
rpmio/rpmlog.cc
Outdated
|
||
if (ctx) { | ||
wrlock lock(ctx->mutex); | ||
newkey = !ctx->seen[domain][{code, key}]++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Showcases the power of C++, nicely done! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there's a lot of magic in that line, perhaps a bit too much. This isn't a LoC competition, understandability is 100x more important. If @ffesti himself needed to think for a while to convince himself why this works while writing it, that's a danger sign already. Maybe this could be opened up a bit, or at least add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind you, I don't find this unreadable, just needing a bit of thought too. I guess the thing that raises my alarms on first sight is the unchecked [] access which is fine with a map but undefined behavior in some others. Considering we're all relative C++ beginners here, some such remark would not be out of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this needs a comment. Is /* This gets initialized automatically on first access */
clear enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This" is unclear 😅 Maybe something to the tune of "[] access on map creates the element on first access", but the more you think about it the more the comment starts seeming redundant...
So dunno - up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the most confusing part is the increment vs. negation, it's not clear which one takes precedence (without looking it up in the C++ docs). I assume it's first incremented and only then negated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it must be the other way around, hmm. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just proves Panu's point of this being perhaps a bit confusing 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now that I think about it, this follows from the definition of the ++ operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what it really needs is a high level comment stating what it does, and leave the understanding as an exercise for the reader as per their C++ experience 😅
rpmlogCtx ctx = rpmlogCtxAcquire(); | ||
int newkey = 0; | ||
|
||
if (ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary? We don't seem to be doing it elsewhere in this file (the global ctx
is always returned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. It is there to limit the scope of the lock. If you want to use rpmlock below to print the numbers for the repeated messages we can't hold a lock there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add a bogus check just to create a scope, if you need a scope then just use a plain { } block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a cleaner version is actually to wrap the seen-check and the lock into a small seenLog() style helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. It is there to limit the scope of the lock. If you want to use rpmlock below to print the numbers for the repeated messages we can't hold a lock there.
Oh right, that didn't occur to me, thanks for clarifying.
recs is not really suited for this. It is a vector so lookups are not cheap. It only stores WARNING and above. We might want to link to the recs entry but even that does not really solve anything if the lower prio entries are missing. The additional string is there for cases where the message contains differing info. E.g. the package name or location of the error. Imagine suppressing a missing macro message. This will give a file and line number. And you do want that so you can look at the place where things go wrong in case it is just a typo. You still want to suppress the follow up messages if the macro is actually missing. So having a key allows to put them all under the same category. I don't think we will have logging objects for every possible occasion but only for majors things like builds or transactions. So I expect the domain to stay relevant even if we no longer have global logging. But that might just me being unimaginative. |
Yeah, logging is kind of global in nature, I don't see that fundamentally changing. The deal with allowing non-global context is to allow something like dnf to have a context they control, not so much with having multiple different contexts active within rpm at the same time. So I don't see this logonce and domain overlapping with that stuff much, if at all really. |
The existing tests don't cover #3336 which this claims to be fixing. |
Also, this is two entirely different things in a single commit. Split them up. |
Sure, the logic for populating What I'm getting at is that we could just record all the messages and only decide how to collapse (de-duplicate) them into a single meaningful message after the fact, perhaps even using the information in those messages somehow. Yes, we need to do that on-the-fly as well, of course, but that would just be a special case of that. That said, I haven't thought this through too much, it's just my gut feeling speaking 😄 Having this specialized
Indeed. I had this in the back of my mind too, just needed to see it written down for it to click. Thanks!
Right, it's just that "context" coincides with "domain" in a way, but we already use the term "context" for a different purpose in the logging module, so yep. |
It just seems to me like this domain tracking (based on an integer key) tries to emulate proper OOP encapsulation. IOW, if an object (like a |
I don't disagree that this can probably be done better with proper OO. But I don't want to get into this right now. Redoing the whole rpmlog thing is a story for another time. As this is a internal API for now we can still change it later on when we get to that. |
Yeah the domain and its attached seen-stuff could be tracked as a proper object too, I mumbled something along those lines in #3389 actually, the id is just a kind of simplified version of that. The log context itself is still a kind of different thing because at least in rpm, if you specify a transaction log context, then what other output is there during a transaction? Ditto for a package build? And they'd all probably be outputing to the same stderr/file anyhow. Maybe there's a case, it's entirely possible, but we really cannot get into that business now. It'll need to be thought through once we decide to do a logging overhaul. |
And yup, for reasons like this it's best done as an internal-only API just now - so we don't need to commit to it for the rest of our lives 😅 |
Agreed, all are good points. Let's go ahead with this PR. |
OK; this should address all issues except for the Python based test case which I will add next. |
OK, test added. |
rpmio/rpmlog.cc
Outdated
@@ -22,6 +23,7 @@ struct rpmlogCtx_s { | |||
unsigned mask; | |||
int nrecsPri[RPMLOG_NPRIS]; | |||
std::vector<rpmlogRec_s> recs; | |||
std::map<uint64_t, std::map<std::pair<int, std::string>, int>> seen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that occurred to me last evening after work: this would be better off with unordered_map for both maps, which is STL's kind of exotic name for what the industry knows as a hash map, std::map maintains things in sort order at all times. Both work but unordered is (obviously) somewhat faster when you don't need the sorted iteration etc.
Not that it'll matter much here because there'll be only a handful of items in either map for the foreseeable future. So not a review requirement (although changing is trivial because both maps dance to the same exact API), just .. maybe education 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about using an unordered map in the beginning and then opted to go for the normal map just in case.
Good I did because I might not have been able to make the unordered_map work right away. The missing hash function for pairs gives a pretty long compiler message. But I got it working now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. It didn't really even occur to me that you'd need to define a hash for it when map didn't require any custom comparison stuff for this structure. That's something to keep in mind.
Water gets deep very suddenly in C++ - on one line you're operating on ridiculously high level and on the next one you run into an absurdly long compiler error because you failed to define some low-level detail you'd kinda expect the compiler to do for you because this is all standard datatypes. Python it is not 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note is that it's good to document such things in the commit message: map worked out of the box, unordered_map didn't - had it been there, I wouldn't have even brought this up.
int newkey = 0; | ||
|
||
if (ctx) { | ||
wrlock lock(ctx->mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that occurred to me last evening is that technically this should use a mutex of its own - there's no reason to prevent other logging events from proceeding while we filter out other messages. Not that it matters here so just a random remark, not a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes. For now I don't want to over complicate things for what is very little gains in practice.
The added test looks alright, there was also a forgotten I won't merge yet to give Florian a chance to act upon Panu's suggestions above (if he wishes so). Otherwise, it's good to go. |
These are internal only for now to allow us gain soem more confidence on the design. rpmlogOnce allows showing a log message only once. rpmlogReset allows purging the list of known message keys for a given domain. This allows for different live times e.g. per transaction or per package. Resolves: rpm-software-management#3395
The code uses the pointer to the rpmts object as a domain and resets it when the rpmts is freed. Various existing test cases fail when the NOKEY message is omited or shown more than once. Just adding a test for multiple ts in the same process. Resolves: rpm-software-management#3336 test
These are internal only for now to allow us gain soem more confidence on the design.
rpmlogOnce allows showing a log message only once. rpmlogReset allows purging the list of known message keys for a given domain. This allows for different live times e.g. per transaction or per package.
Use in handleHdrVS. This does not add a new test case but various existing test cases fail when the NOKEY message is omited or shown more than once. The code uses the pointer to the rpmts object as a domain and resets it when the rpmts is freed.
Resolves: #3336
Resolves: #3395