-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Snap Strategy #306
Add Snap Strategy #306
Conversation
DCGM and smartctl- will be installed via snap. This PR creates the SnapStrategy so we can install, remove and check as it happens with other stratigies in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not sure how helpful it is to add this before adding any actual snaps, because this code doesn't have any real use yet, can't be functionally tested, and we don't have any client code that uses it (ie. we don't know yet if this api will be what we want).
It's tricky with figuring out when to write things and in what order, and I think I've gone back and forth over it. For this, my opinion is that this is a new feature where we should add support for an actual snap, not just developing an api in a vaccuum. :)
The DCGM and Smartctl snaps are not ready yet. This PR can be pending until those snaps are available. The idea here is just start having conversations about the approach and not wait until is ready for trying to figure out what to do. |
src/hw_tools.py
Outdated
self._name = tool | ||
self.snap_name = tool.value | ||
|
||
def install(self, channel: str = "latest/stable") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that channel
should be part of init function, so the install is not needed. Otherwise, I do not see any benefit of even adding this strategy instead of directly using snap
lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the channel is part of the configuration we get from the charm configuration?
Actually we need to know how to get the channel information when execute the install function. This PR need to include the changing of HWToolHelpe
?
@gabrielcocenza Can you add more information about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included the channel at the init
and had to change the HWToolHelper
to be able to instantiate a class passing the channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's good for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something was lost in the splitting of PRs, but this method still takes channel
as an argument, and doesn't use self.channel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need def install(self, channel)
but not def __init__(self, channel)
? if you are following the current design? https://github.com/canonical/hardware-observer-operator/blob/main/src/hw_tools.py#L182
src/hw_tools.py
Outdated
self._name = tool | ||
self.snap_name = tool.value | ||
|
||
def install(self, channel: str = "latest/stable") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the channel is part of the configuration we get from the charm configuration?
Actually we need to know how to get the channel information when execute the install function. This PR need to include the changing of HWToolHelpe
?
@gabrielcocenza Can you add more information about this?
src/hw_tools.py
Outdated
snap.add(self.snap_name, channel=self.channel) | ||
except snap.SnapError as err: | ||
logger.error( | ||
"Failed to install %s on channel: %s: %s", self.snap_name, self.channel, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
"Failed to install %s on channel: %s: %s", self.snap_name, self.channel, err | |
"Failed to install %s from channel: %s: %s", self.snap_name, self.channel, err |
src/hw_tools.py
Outdated
"Failed to install %s on channel: %s: %s", self.snap_name, self.channel, err | ||
) | ||
else: | ||
logger.info("Installed %s on channel: %s", self.snap_name, self.channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info("Installed %s on channel: %s", self.snap_name, self.channel) | |
logger.info("Installed %s from channel: %s", self.snap_name, self.channel) |
src/hw_tools.py
Outdated
|
||
def enable_services(self) -> None: | ||
"""Enable the snap services.""" | ||
# breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this is a leftover from manual tests
src/hw_tools.py
Outdated
@@ -695,14 +738,15 @@ def install(self, resources: Resources, hw_available: Set[HWTool]) -> Tuple[bool | |||
if path: | |||
strategy.install(path) | |||
# APTStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: needs updating
@gabrielcocenza Would you mind separating this PR into two parts? The first part is common to both the smartctl-exporter-snap and dcgm-exporter-snap, so it would be more convenient to work on them in parallel if we decouple the first and second parts. Thanks!
I propose to merge SnapStrategy first. |
@samuelallan72 @aieri @rgildein Eric asked to split the SnapStrategy from the implementation of the dcgm exporter. I guess now there is a good idea on how it's going to be used and I'll open a PR soon on top of this one to implement the dcgm exporter considering those comments |
src/hw_tools.py
Outdated
self._name = tool | ||
self.snap_name = tool.value | ||
|
||
def install(self, channel: str = "latest/stable") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something was lost in the splitting of PRs, but this method still takes channel
as an argument, and doesn't use self.channel
.
src/hw_tools.py
Outdated
def enable_services(self) -> None: | ||
"""Enable the snap services.""" | ||
self.snap_client.start(list(self.snap_client.services.keys()), enable=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon this could be a private method (it's not used outside the strategy), or even better inlined because it's a single line method.
src/hw_tools.py
Outdated
def install(self, channel: str = "latest/stable") -> None: | ||
"""Install the snap from a channel.""" | ||
try: | ||
snap.add(self.snap_name, channel=channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snap.add(self.snap_name, channel=channel) | |
snap.add(self.snap_name, channel=self.channel) |
logger.error( | ||
"Failed to install %s from channel: %s: %s", self.snap_name, self.channel, err | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logs an error to the charm log, but that's silent to the user - can we bubble this error up to the charm status? How do the other exporters handle installation failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good found Samuel!
Please follow the current error handling design here.
hardware-observer-operator/src/hw_tools.py
Lines 701 to 708 in d420c63
except ( | |
ResourceFileSizeZeroError, | |
OSError, | |
apt.PackageError, | |
ResourceChecksumError, | |
) as e: | |
logger.warning("Strategy %s install fail: %s", strategy, e) | |
fail_strategies.append(strategy.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to re-raise the error. Thanks
src/hw_tools.py
Outdated
self.channel = channel | ||
self.snap_client = snap.SnapCache()[tool.value] | ||
|
||
def install(self, channel: str = "latest/stable") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def install(self, channel: str = "latest/stable") -> None: | |
def install(self) -> None: |
logger.error( | ||
"Failed to install %s from channel: %s: %s", self.snap_name, self.channel, err | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good found Samuel!
Please follow the current error handling design here.
hardware-observer-operator/src/hw_tools.py
Lines 701 to 708 in d420c63
except ( | |
ResourceFileSizeZeroError, | |
OSError, | |
apt.PackageError, | |
ResourceChecksumError, | |
) as e: | |
logger.warning("Strategy %s install fail: %s", strategy, e) | |
fail_strategies.append(strategy.name) |
@@ -175,6 +176,44 @@ def remove(self) -> None: | |||
# the remove option. | |||
|
|||
|
|||
class SnapStrategy(StrategyABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
I kind of prefer code like this:
- No init function
- hard code HWTool
- Pass channel on
install
function
class SnapStrategyABC(StrategyABC, metaclass=ABCMeta):
"""Snap strategy class."""
def install(self, channel: str = "latest/stable") -> None:
"""Install the snap from a channel."""
# define install details here.
...
class SmartCtlStrategy(SnapStrategyABC):
_name = HWTool.SMARTCTL_EXPORTER
class HWToolHelper:
"""Helper to install vendor's or hardware related tools."""
def strategies(self) -> List[StrategyABC]:
"""Define strategies for every tools."""
return [
StorCLIStrategy(),
RedFishStrategy(),
...
# It's more easy to define strategy here
SmartCtlStrategy(),
]
# The channel information should only be given here. This make sure the object dependency is one-direction.
def install(self, resources: Resources, channel: (some-type), hw_available: Set[HWTool]) -> Tuple[bool, str]:
"""Install tools."""
...
for strategy in self.strategies:
if strategy.name not in hw_available:
continue
try:
if isinstance(strategy, TPRStrategyABC):
path = fetch_tools.get(strategy.name) # pylint: disable=W0212
if path:
strategy.install(path)
if isinstance(strategy, SnapStrategy):
# The channel should be an argument for install function
strategy.install(channel)
elif isinstance(strategy, APTStrategyABC):
strategy.install() # pylint: disable=E1120
logger.info("Strategy %s install success", strategy)
except (
ResourceFileSizeZeroError,
OSError,
apt.PackageError,
ResourceChecksumError,
snap.SnapError,
) as e:
logger.warning("Strategy %s install fail: %s", strategy, e)
fail_strategies.append(strategy.name)
if fail_strategies:
return False, f"Fail strategies: {fail_strategies}"
return True, ""
So the (dis)advantage are:
- The strategy can keep in hard code, it's more easy to trace and keep the flexibility. (Imagine if you have different operation for DCGM & Smart).
Chance to use snap resource in the future- The only breaking change only happen on
HWToolHelper.install
function. Other design won't be broken.- Get the channel information in Charm class object and pass it to install function. This make sure the object dependency is one direaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different snaps will have different channels to install from. I don't see how passing a single channel on HWToolHelper can solve this. I think it would be very strange to have one charm config for channel to rule on all snaps channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I agree that we should remove the constructor for consistency with the other strategies, I don't quite get why it would be preferable to move complexity to the install function, as that requires the caller to have deeper knowledge about how each strategy works and adds extra conditionals to HWToolHelper.install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a protocol to pass all the required material, resources, configuration, and hardware detect result, for the strategies.
So install
function is the protocol right now. If we break this rule then the ISP(interface-segregation principles) will be broken then you have to find a tricky/hacky way to import those material into the strategies.
(I am open to any way to refactor the protocol we have now, but it's a refactor so it shouldn't happen on this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different snaps will have different channels to install from. I don't see how passing a single channel on ...
It's a pseudocode code, so pass the configuration you want.
- removed the enable_services and put at the install method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but the snap strategy should be aligned with the other strategies in terms of configuration being done by the client when calling install
and not in the constructor (for consistency)
@@ -175,6 +176,44 @@ def remove(self) -> None: | |||
# the remove option. | |||
|
|||
|
|||
class SnapStrategy(StrategyABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I agree that we should remove the constructor for consistency with the other strategies, I don't quite get why it would be preferable to move complexity to the install function, as that requires the caller to have deeper knowledge about how each strategy works and adds extra conditionals to HWToolHelper.install
I agree.... @aieri @jneo8 @samuelallan72 After putting more effort on how to integrate the dcgm snap I ended up noticing that we might not need to have a SnapStrategy for exporters that are snaped. When we have debian packages or other binaries the Strategy is installing and removing the packages while the exporter on the When we have a snap I'm not seeing the need to have this kind of separation and it's seems to me that it's simpler just handle everything as a service and we'll be able to do pretty much everything in just one module. Please take a look at the draft PR that I've created at #319 Unfortunately this PR is showing to be hard to have the Generic snap class without a real implementation. In any case, I think this other PR has a cleaner approach |
Closing this PR. The approach changed to not have a SnapStrategy and work directly with the exporter class itself. The new PR is at #319 |
DCGM and smartctl- will be installed via snap. This PR creates the SnapStrategy so we can install, remove and check as it happens with other strategies in the code base.