Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocrvs 7687 #1190

Closed
wants to merge 4 commits into from
Closed

Ocrvs 7687 #1190

wants to merge 4 commits into from

Conversation

alsmk
Copy link
Collaborator

@alsmk alsmk commented Dec 18, 2024

  1. Created separate log file for errors
  2. Errors are to be redirected to another log files

Copy link

Oops! Looks like you forgot to update the changelog. When updating CHANGELOG.md, please consider the following:

  • Changelog is read by country implementors who might not always be familiar with all technical details of OpenCRVS. Keep language high-level, user friendly and avoid technical references to internals.
  • Answer "What's new?", "Why was the change made?" and "Why should I care?" for each change.
  • If it's a breaking change, include a migration guide answering "What do I need to do to upgrade?".

@alsmk alsmk requested a review from rikukissa December 18, 2024 12:53
@alsmk alsmk self-assigned this Dec 18, 2024
@@ -1,7 +1,7 @@
{"attributes":{"actions":[{"actionRef":"preconfigured:preconfigured-alert-history-es-index","actionTypeId":".index","group":"metrics.inventory_threshold.fired","params":{"documents":["{\"@timestamp\":\"2024-08-06T07:57:35.644Z\",\"tags\":\"{{rule.tags}}\",\"rule\":{\"id\":\"{{rule.id}}\",\"name\":\"{{rule.name}}\",\"params\":{\"{{rule__type}}\":\"{{params}}\"},\"space\":\"{{rule.spaceId}}\",\"type\":\"{{rule.type}}\"},\"kibana\":{\"alert\":{\"id\":\"{{alert.id}}\",\"context\":{\"{{rule__type}}\":\"{{context}}\"},\"actionGroup\":\"{{alert.actionGroup}}\",\"actionGroupName\":\"{{alert.actionGroupName}}\"}},\"event\":{\"kind\":\"alert\"}}"]}}],"alertTypeId":"metrics.alert.inventory.threshold","apiKey":null,"apiKeyOwner":null,"consumer":"alerts","createdAt":"2023-11-17T12:01:46.420Z","createdBy":"elastic","enabled":false,"executionStatus":{"error":null,"lastExecutionDate":"2024-08-06T08:02:44.568Z","status":"pending"},"legacyId":null,"meta":{"versionApiKeyLastmodified":"7.17.18"},"muteAll":false,"mutedInstanceIds":[],"name":"Available disk space in root file system","notifyWhen":"onActionGroupChange","params":{"alertOnNoData":true,"criteria":[{"comparator":">=","customMetric":{"aggregation":"max","field":"system.filesystem.used.pct","id":"alert-custom-metric","type":"custom"},"metric":"custom","threshold":[0.7],"timeSize":1,"timeUnit":"h","warningComparator":">=","warningThreshold":[0.5]}],"filterQuery":"{\"bool\":{\"should\":[{\"match_phrase\":{\"system.filesystem.mount_point\":\"/hostfs\"}}],\"minimum_should_match\":1}}","filterQueryText":"system.filesystem.mount_point : \"/hostfs\"","nodeType":"host","sourceId":"default"},"schedule":{"interval":"1h"},"scheduledTaskId":null,"tags":["infra","opencrvs-builtin"],"throttle":null,"updatedAt":"2024-08-06T08:01:56.542Z","updatedBy":"elastic"},"coreMigrationVersion":"7.17.18","id":"14778650-8541-11ee-9002-2f37fdc4e5d5","migrationVersion":{"alert":"7.16.0"},"references":[],"sort":[1722931316554,643],"type":"alert","updated_at":"2024-08-06T08:01:56.554Z","version":"WzM5MywxXQ=="}
{"attributes":{"actions":[{"actionRef":"preconfigured:preconfigured-alert-history-es-index","actionTypeId":".index","group":"threshold_met","params":{"documents":["{\"@timestamp\":\"2022-04-18T07:05:33.819Z\",\"tags\":\"{{rule.tags}}\",\"rule\":{\"id\":\"{{rule.id}}\",\"name\":\"{{rule.name}}\",\"params\":{\"{{rule__type}}\":\"{{params}}\"},\"space\":\"{{rule.spaceId}}\",\"type\":\"{{rule.type}}\"},\"kibana\":{\"alert\":{\"id\":\"{{alert.id}}\",\"context\":{\"{{rule__type}}\":\"{{context}}\"},\"actionGroup\":\"{{alert.actionGroup}}\",\"actionGroupName\":\"{{alert.actionGroupName}}\"}},\"event\":{\"kind\":\"alert\"}}"]}}],"alertTypeId":"apm.error_rate","apiKey":null,"apiKeyOwner":null,"consumer":"alerts","createdAt":"2022-06-01T11:30:27.033Z","createdBy":"opencrvs-admin","enabled":false,"executionStatus":{"error":null,"lastExecutionDate":"2024-02-07T08:28:08.400Z","status":"pending"},"legacyId":null,"meta":{"versionApiKeyLastmodified":"7.17.0"},"muteAll":false,"mutedInstanceIds":[],"name":"Error in service","notifyWhen":"onActionGroupChange","params":{"environment":"ENVIRONMENT_ALL","threshold":1,"windowSize":1,"windowUnit":"m"},"schedule":{"interval":"1m"},"scheduledTaskId":null,"tags":[],"throttle":null,"updatedAt":"2024-02-05T03:00:20.633Z","updatedBy":"opencrvs-admin"},"coreMigrationVersion":"7.17.0","id":"3b6722e0-e19e-11ec-ba8e-51649755648d","migrationVersion":{"alert":"7.16.0"},"references":[],"sort":[1707273006619,214975],"type":"alert","updated_at":"2024-02-07T02:30:06.619Z","version":"WzQ5MjAzNCwxOV0="}
{"attributes":{"buildNum":46534,"defaultIndex":"metricbeat-*"},"coreMigrationVersion":"7.17.0","id":"7.17.0","migrationVersion":{"config":"7.13.0"},"references":[],"sort":[1707273006619,216009],"type":"config","updated_at":"2024-02-07T02:30:06.619Z","version":"WzQ5MjQyOCwxOV0="}
{"attributes":{"actions":[{"actionRef":"preconfigured:preconfigured-alert-history-es-index","actionTypeId":".index","group":"logs.threshold.fired","params":{"documents":["{\"@timestamp\":\"2023-11-22T08:25:47.329Z\",\"tags\":\"{{rule.tags}}\",\"rule\":{\"id\":\"{{rule.id}}\",\"name\":\"{{rule.name}}\",\"params\":{\"{{rule__type}}\":\"{{params}}\"},\"space\":\"{{rule.spaceId}}\",\"type\":\"{{rule.type}}\"},\"kibana\":{\"alert\":{\"id\":\"{{alert.id}}\",\"context\":{\"{{rule__type}}\":\"{{context}}\"},\"actionGroup\":\"{{alert.actionGroup}}\",\"actionGroupName\":\"{{alert.actionGroupName}}\"}},\"event\":{\"kind\":\"alert\"}}"]}}],"alertTypeId":"logs.alert.document.count","apiKey":null,"apiKeyOwner":null,"consumer":"alerts","createdAt":"2023-11-22T08:32:38.272Z","createdBy":"elastic","enabled":false,"executionStatus":{"error":null,"lastExecutionDate":"2024-02-07T08:28:08.400Z","status":"pending"},"legacyId":null,"meta":{"versionApiKeyLastmodified":"7.17.0"},"muteAll":false,"mutedInstanceIds":[],"name":"Error in backup logs","notifyWhen":"onActionGroupChange","params":{"count":{"comparator":"more than or equals","value":1},"criteria":[{"comparator":"matches","field":"message","value":"error"},{"comparator":"equals","field":"log.file.path","value":"/var/log/opencrvs-backup.log"}],"timeSize":1,"timeUnit":"h"},"schedule":{"interval":"1h"},"scheduledTaskId":null,"tags":[],"throttle":null,"updatedAt":"2024-02-07T02:27:22.558Z","updatedBy":"elastic"},"coreMigrationVersion":"7.17.0","id":"b166fcb0-8911-11ee-8111-2f3be9e93efc","migrationVersion":{"alert":"7.16.0"},"references":[],"sort":[1707294436464,232766],"type":"alert","updated_at":"2024-02-07T08:27:16.464Z","version":"WzQ5NTQzNCwxOV0="}
{"attributes":{"actions":[{"actionRef":"preconfigured:preconfigured-alert-history-es-index","actionTypeId":".index","group":"logs.threshold.fired","params":{"documents":["{\"@timestamp\":\"2023-11-22T08:25:47.329Z\",\"tags\":\"{{rule.tags}}\",\"rule\":{\"id\":\"{{rule.id}}\",\"name\":\"{{rule.name}}\",\"params\":{\"{{rule__type}}\":\"{{params}}\"},\"space\":\"{{rule.spaceId}}\",\"type\":\"{{rule.type}}\"},\"kibana\":{\"alert\":{\"id\":\"{{alert.id}}\",\"context\":{\"{{rule__type}}\":\"{{context}}\"},\"actionGroup\":\"{{alert.actionGroup}}\",\"actionGroupName\":\"{{alert.actionGroupName}}\"}},\"event\":{\"kind\":\"alert\"}}"]}}],"alertTypeId":"logs.alert.document.count","apiKey":null,"apiKeyOwner":null,"consumer":"alerts","createdAt":"2023-11-22T08:32:38.272Z","createdBy":"elastic","enabled":false,"executionStatus":{"error":null,"lastExecutionDate":"2024-02-07T08:28:08.400Z","status":"pending"},"legacyId":null,"meta":{"versionApiKeyLastmodified":"7.17.0"},"muteAll":false,"mutedInstanceIds":[],"name":"Error in backup logs","notifyWhen":"onActionGroupChange","params":{"count":{"comparator":"more than or equals","value":1},"criteria":[{"comparator":"equals","field":"log.file.path","value":"/var/log/opencrvs-backup.error.log"}],"timeSize":1,"timeUnit":"h"},"schedule":{"interval":"1h"},"scheduledTaskId":null,"tags":[],"throttle":null,"updatedAt":"2024-02-07T02:27:22.558Z","updatedBy":"elastic"},"coreMigrationVersion":"7.17.0","id":"b166fcb0-8911-11ee-8111-2f3be9e93efc","migrationVersion":{"alert":"7.16.0"},"references":[],"sort":[1707294436464,232766],"type":"alert","updated_at":"2024-02-07T08:27:16.464Z","version":"WzQ5NTQzNCwxOV0="}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be 100% sure no errors are ending up in /var/log/opencrvs-backup.log anymore, we'd need to test the old error scenarios and verify the indeed end up in the right file. In the backup script, especially these can be a little dangerous and might not correctly output their error.

Question you can think about is:
How could we test the different error scenarios are now properly sending an alert and write errors to the opencrvs-backup.error.log?

As testing this can be quite time consuming, my suggestion is: let's also keep the old alert rule in place that tries to find the word "error" in the /var/log/opencrvs-backup.log. We can then at some point come back to this task and see what type of stuff each file contains and remove this "extra" error rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up using a different name for the "deprecated" (so the old) alert rule, please note that you need to add a new filter for Elastalert too

https://github.com/opencrvs/opencrvs-countryconfig/blob/242878c13e0ce335c096eed10c59f77be812f805/infrastructure/monitoring/elastalert/rules/log-alert.yaml#L24-L30

Elastalert is a service that

  1. Listens Elasticsearch indices for the alerts triggered
  2. Sends a HTTP request to our country config package's POST /email endpoint
  3. Country config then sends an email (which is in our environments configured to be a Slack email inbox)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I like this idea to keep the old alert rule, that will ensure us about the correctness of our work without taking risk.

Copy link
Member

@rikukissa rikukissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this now so I'm not blocking you but please attend the comment I made

@rikukissa
Copy link
Member

rikukissa commented Dec 20, 2024

Oh and I just realised – these changes need to be synced in the country config repository @alsmk. Maybe better get the PR in to that repository's develop and then merge country config develop back to Farajaland's develop

@alsmk alsmk closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants