Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance charm's life cycle. #119

Merged

Conversation

chanchiwai-ray
Copy link
Contributor

@chanchiwai-ray chanchiwai-ray commented Nov 29, 2023

Description of changes:

  • Improved install_or_upgrade event (addresses 4)
    • Set Maintenance status when install or upgrade
    • Set Blocked status and defer when install or upgrade event failed (both resource and exporter)
    • Skip installing exporter if it's already installed (e.g. during upload resource)
  • Improved remove event
    • Provide warning message when exporter files are not properly removed
  • Improved update_status event
    • Add check for relating more than one grafana agent (addresses 1)
    • Add check for validating exporter's health, and try to recover it if it fails. If it cannot be recovered set Error status. (addresses 2,3)
    • Add check for validating if the tools are properly installed
  • Improved config_changed event
    • Add check for validating exporter configs
  • Miscellaneous updates to hw_tool.py for adding tool validation logic

Maybe closes:

  1. hook failed: "update-status" when relating for more than one application #95
  2. Changing log level with juju config didn't restart the service #87
  3. Hardware-exporter systemd service doesn't start on charm refresh #86
  4. More useful juju message when required relations are missing #92

@chanchiwai-ray
Copy link
Contributor Author

Refactoring can be contributed by someone else

Copy link

@Pjack Pjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error status definition: any situation entering error status, we can assume there is a bug in charm. In general, if the step fails, it should go to blocked status.

If this is still the logic we want to follow, I think we should enter blocked status in most of case. But I see most of failture cause the charm enter error status now.

src/service.py Outdated Show resolved Hide resolved
src/charm.py Outdated Show resolved Hide resolved
src/charm.py Outdated Show resolved Hide resolved
src/hw_tools.py Outdated Show resolved Hide resolved
src/hw_tools.py Outdated Show resolved Hide resolved
linting.

Co-authored-by: jneo8 <[email protected]>

Continue to remove refactoring code from the previous branch, and fix
status handling.
@chanchiwai-ray
Copy link
Contributor Author

Further removed refactoring code from this PR, and make as less changes as possible. Also cherry-pick @jneo8 strategy checks into this PR.

@chanchiwai-ray chanchiwai-ray requested a review from Pjack November 30, 2023 04:16
@chanchiwai-ray
Copy link
Contributor Author

Updated description

@Pjack
Copy link

Pjack commented Nov 30, 2023

Will this MR also cover this one?
#84

@chanchiwai-ray
Copy link
Contributor Author

chanchiwai-ray commented Nov 30, 2023

Will this MR also cover this one? #84

No, the implementation was removed from this PR

Copy link

@Pjack Pjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick

src/charm.py Outdated Show resolved Hide resolved
Copy link

@Pjack Pjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no further big change, please implement unit test

Copy link
Contributor

@jneo8 jneo8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command and I don't see where does #114 been fixed?

src/charm.py Show resolved Hide resolved
src/charm.py Show resolved Hide resolved
src/charm.py Outdated
)
return

hw_tool_ok, error_msg = self.hw_tool_helper.check_installed()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need fix:

Shouldn't we check the resources before checking the service?
Imagine we restart a service with wrong resource.

Copy link
Contributor

@dashmage dashmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small comment and the tests would need to be updated to reflect these changes.
Apart from that, LGTM :)

@@ -127,6 +147,10 @@ def name(self) -> HWTool:
"""Name."""
return self._name

@abstractmethod
def check(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to rename this to be more descriptive? Maybe check_status? I'm open to other names as well.

Copy link
Contributor Author

@chanchiwai-ray chanchiwai-ray Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from this commit, I didn't authored that changes, and it's only internal use, so it's not so important

Copy link
Contributor

@dashmage dashmage Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind then. I still think it could use a nicer name since check sounds a bit ambiguous but it's alright even if we keep it check.

@chanchiwai-ray
Copy link
Contributor Author

Command and I don't see where does #114 been fixed?

Maybe the implementation was removed from this PR, I'll updated the description

@dashmage
Copy link
Contributor

Since we're merging to the feat/life-cycle branch, there's no automated CI happening here. So it will be hard to verify whether the unit + func tests are passing. How do we validate this since it's not a PR to main/ master?

@chanchiwai-ray
Copy link
Contributor Author

chanchiwai-ray commented Nov 30, 2023

Since we're merging to the feat/life-cycle branch, there's no automated CI happening here. So it will be hard to verify whether the unit + func tests are passing. How do we validate this since it's not a PR to main/ master?

As I understand, this PR only includes the implementation with ops-testing (with hardness, you can test events). Further improvement / manual testing / refactoring / test mindmap update / etc will be handle later. And proper functional test and manual verification are required for merging into main branch.

Copy link
Contributor

@jneo8 jneo8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM now. Please add more details in test function name & doc-string.

tests/unit/test_charm.py Outdated Show resolved Hide resolved
@chanchiwai-ray chanchiwai-ray requested a review from jneo8 December 1, 2023 03:59
- Fixed some warning message in unittest
- Patched exporter retry constants
@jneo8 jneo8 merged commit 5f734cf into canonical:feat/life-cycle Dec 4, 2023
jneo8 added a commit that referenced this pull request Dec 25, 2023
* Enhance charm's life cycle. (#119)

* Enhance charm's life cycle.

* Remove unused property.

* Cherry-pick from 7513f18, and fix
linting.

Co-authored-by: jneo8 <[email protected]>

Continue to remove refactoring code from the previous branch, and fix
status handling.

* Move "exporter health check failed and restart" logic into charm.py

* add resource check in update_status event, and change Error -> Blocked

* Fix typo

* Add unittests.

* Update unit test docstring and name.

* - Used `parameterized`
- Fixed some warning message in unittest
- Patched exporter retry constants

* Separate enabling logic of individual ipmi services monitoring (#123)

* Separate enabling logic of individual ipmi services monitoring

* Add and refactor unit tests

* Fix/resoure install failed (#135)

* fix: lifecycle bug fix

- Remove install retry
- Add cache to hw_white_list to avoid repeat calls
- Fix missing resource failed msg in update status hook

* fix: Remove resource_failed_msg

* fix: Fix type checking comments and defer relation_join hook if checking not pass

* fix: Missing return after grafana-agent relation join event defer

* fix: Force re-config exporter when upgrade (#140)

* refactor: Move restart exporter logic as single function (#139)

* Fix/skip remove freeipmi tools (#138)

* fix: Skip removing pkg on IPMI strategy

Skip removing package because we afraid this cause
dependency error for any other services on the same machine.

* fix: Skip remove ssacli pkg and repo

* test: Fix failed functional test (#141)

Fix failure functional test caused because of life-cycle changing.

* style: Linter fix (#142)

* fix: apt install freeipmi-tools failed on focal (#143)

* fix: apt install freeipmi-tools failed on focal

Fix: #128

---------

Co-authored-by: Ray Chan <[email protected]>
Co-authored-by: Sudeep Bhandari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants