Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journal probe rework #7

Merged
merged 11 commits into from
Feb 17, 2017
22 changes: 11 additions & 11 deletions src/probes/journal.c
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,16 @@ static int read_new_entries(sd_journal *journal)

add_to_payload(data, length);

// For now, we send one record per log message, in case the
// there is a large backlog of messages and we exceed the
// payload size limit (8KB). And ignore errors, hoping that it's
// a transient problem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as we can try to put things back together server side. Though could we have a message-id + optional message part X/Y type thing? It would really make things easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using the telemetry machine-id + client record timestamp fields to piece together a series of journal probe records. Would that be acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yea that should work just as well. Drinking from the firehouse of the full log will always be a little tricky.


if (!send_data(error_class)) {
telem_log(LOG_ERR, "Failed to send data. Ignoring.\n");
return num_entries;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am a little worried about this as the function doesn't seem to take a last tried value so I'm unsure how well this would handle an error in the middle of sending messages and resuming from that point later on.

}

num_entries++;
}

Expand Down Expand Up @@ -151,17 +161,11 @@ static bool process_existing_entries(sd_journal *journal)
ret = read_new_entries(journal);
if (ret < 0) {
return false;
}

if (!payload) {
} else if (ret == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah if I would have read a little further on I'd have seen it.

So I'm wondering if before returning when send_data fails, we could put the data back onto the journal so it can be attempted to be read next time.

Also ret could be 0 when send_data fails so the no existing entries log message would be wrong. I'd return -1 in the send_data case and hopefully there is a way to get data back into the journal after pulling it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent observation. I wasn't considering how the probe should behave in the event of send_data failures. I suppose I could call sd_journal_previous to move the pointer in the reverse direction in this case, but a timeout would be needed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think that I want to continue ignoring send_data failures for now, since this may indicate a configuration problem with the telemetry client on the system. Also, I think it's acceptable for the journal probe to be "lossy" with respect to the data it collects.

But I will revisit the error handling here later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the LOG_DEBUG you mentioned.

telem_log(LOG_DEBUG, "No existing entries found\n");
return true;
}

if (!send_data(error_class)) {
return false;
}

return true;
}

Expand Down Expand Up @@ -278,10 +282,6 @@ static bool process_journal(void)
} else if (r < 0) {
return false;
}

if (!send_data(error_class)) {
return false;
}
}
}
}
Expand Down