Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide unreachable parents in root problems #1073

Open
wants to merge 4 commits into
base: dependencies
Choose a base branch
from

Conversation

nilmerg
Copy link
Member

@nilmerg nilmerg commented Oct 8, 2024

No description provided.

@nilmerg nilmerg requested a review from raviks789 October 8, 2024 14:21
@nilmerg nilmerg self-assigned this Oct 8, 2024
@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Oct 8, 2024
@nilmerg nilmerg changed the base branch from main to dependencies October 8, 2024 14:22
@nilmerg nilmerg force-pushed the hide-unreachable-parents-in-root-problems branch from 61dfa18 to 5626597 Compare October 8, 2024 14:31
@nilmerg nilmerg changed the base branch from dependencies to main October 9, 2024 10:13
@nilmerg nilmerg changed the base branch from main to fix/unstable-service-order October 9, 2024 10:13
@nilmerg nilmerg changed the base branch from fix/unstable-service-order to dependencies October 9, 2024 10:14
Copy link
Contributor

@raviks789 raviks789 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change causes false negatives.

For example, if there is an explicit dependency between two services with same host, and the host is down, then both the services become unreachable but also the dependency fails. And this change would filter out the unreachable parent service, which should not happen.

See the screenshot below for one such case

Screenshot 2024-10-14 at 15 55 39

@raviks789
Copy link
Contributor

raviks789 commented Oct 15, 2024

This change causes false negatives.

For example, if there is an explicit dependency between two services with same host, and the host is down, then both the services become unreachable but also the dependency fails. And this change would filter out the unreachable parent service, which should not happen.

This seems to apply to any service-to-service dependency. See the screenshot below.

Screenshot 2024-10-15 at 10 39 08

So put it simply, if the host of the parent service is down, the parent service becomes unreachable. This in turn results in the dependency being failed. But according to the filter the parent service should be reachable.

Hence, it might also be safe to conclude there will be false negatives for service-to-host dependencies as well.

@nilmerg nilmerg force-pushed the hide-unreachable-parents-in-root-problems branch from 5626597 to 1532fe5 Compare November 28, 2024 09:41
@nilmerg
Copy link
Member Author

nilmerg commented Nov 28, 2024

I fully reworked this. Please have a look at it again.

I used the following configuration for tests: (Each check rotates through every valid state, just use "Check Now" to change it. Though, don't forget to run icingacli icingadb dependency syncState ;) )

graph TD;
    A-->B;
    B-->D;
    C-->D;
    D-->H;
    F-->E;
    E-->G;
    G-->H;
    A-.->F;
    B-.->E;
    B-.->C;
    G-.->H;
    subgraph GD [ redundancy group ]
    G~~~D
   end
Loading
template Host "dbweb-1073-host" {
    import "generic-host"

    check_command = "dummy"
    check_interval = 365d
    retry_interval = 365d
    max_check_attempts = 1

    vars.issue_no = 1073

    vars.state_id = 0
    vars.states = [0, 2]

    var that = this
    vars.dummy_state = function () use (that) {
        var ret = that.vars.states[that.vars.state_id]

        that.vars.state_id += 1
        if (that.vars.state_id == len(that.vars.states)) {
            that.vars.state_id = 0
        }

        return ret
    }
}

template Service "dbweb-1073-service" {
    import "generic-service"

    check_command = "dummy"
    check_interval = 365d
    retry_interval = 365d
    max_check_attempts = 1

    vars.issue_no = 1073

    vars.state_id = 0
    vars.states = [0, 1, 2, 3]

    var that = this
    vars.dummy_state = function () use (that) {
        var ret = that.vars.states[that.vars.state_id]

        that.vars.state_id += 1
        if (that.vars.state_id == len(that.vars.states)) {
            that.vars.state_id = 0
        }

        return ret
    }
}

object Host "A" {
    import "dbweb-1073-host"
}

object Host "B" {
    import "dbweb-1073-host"
}

object Service "C" {
    import "dbweb-1073-service"

    host_name = "B"
}

object Host "D" {
    import "dbweb-1073-host"
}

object Service "E" {
    import "dbweb-1073-service"

    host_name = "B"
}

object Service "F" {
    import "dbweb-1073-service"

    host_name = "A"
}

object Host "G" {
    import "dbweb-1073-host"
}

object Service "H" {
    import "dbweb-1073-service"

    host_name = "G"
}

object Dependency "A-B" {
    parent_host_name = "A"
    child_host_name = "B"
}

object Dependency "B-D" {
    parent_host_name = "B"
    child_host_name = "D"
}

object Dependency "C-D" {
    parent_host_name = "B"
    parent_service_name = "C"
    child_host_name = "D"
}

object Dependency "F-E" {
    parent_host_name = "A"
    parent_service_name = "F"
    child_host_name = "B"
    child_service_name = "E"
}

object Dependency "E-G" {
    parent_host_name = "B"
    parent_service_name = "E"
    child_host_name = "G"
}

object Dependency "G-H" {
    parent_host_name = "G"
    child_host_name = "G"
    child_service_name = "H"

    redundancy_group = "GD"
}

object Dependency "D-H" {
    parent_host_name = "D"
    child_host_name = "G"
    child_service_name = "H"

    redundancy_group = "GD"
}

@nilmerg nilmerg requested a review from raviks789 November 28, 2024 09:47
@nilmerg nilmerg force-pushed the hide-unreachable-parents-in-root-problems branch 3 times, most recently from e378b03 to 4a1348f Compare December 11, 2024 10:31
A node is responsible if:

* it's a host which is reachable but has a problem
* it's a service which is reachable but has a problem
* it's a redundancy group which is reachable but has failed
They can now be unreachable, thus get the same icon as others.
The state then isn't about reachability anymore, so it's just
critical or ok.
@nilmerg nilmerg force-pushed the hide-unreachable-parents-in-root-problems branch from 4a1348f to 30269ef Compare December 17, 2024 16:53
@nilmerg
Copy link
Member Author

nilmerg commented Dec 17, 2024

Though, don't forget to run icingacli icingadb dependency syncState ;)

This is now based on the latest schema updates, hence my dependency writer does not work anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants