Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IcingaDB won't start when max_check_attempts is out of range #655

Closed
A41susan opened this issue Oct 4, 2023 · 3 comments · Fixed by #656
Closed

IcingaDB won't start when max_check_attempts is out of range #655

A41susan opened this issue Oct 4, 2023 · 3 comments · Fixed by #656
Assignees
Labels

Comments

@A41susan
Copy link

A41susan commented Oct 4, 2023

Describe the bug

Defining this parameter in any service will cause the IcingaDB to crash. Removing the parameter does not help.

max_check_attempts = 1000

To Reproduce

Define in any service:
max_check_attempts = 1000

Reload

Expected behavior

Config validation fails.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): r2.14.0-1
  • Operating System and version: SUSE Linux Enterprise Server 12 SP5
  • Enabled features (icinga2 feature list): api checker graphite icingadb ido-mysql influxdb2 mainlog notification
  • icingadb --version
    Icinga DB version: v1.1.0

Build information:
Go version: go1.18.1 (linux, amd64)
Git commit: a0093d1

System information:
Platform: SLES
Platform version: 12-SP5

Additional context

Adding a parameter like max_check_attempts with the value 1000 might be questionable and unnecessary in the first place. But I didn't expect it to crash the IcingaDB in such a way that it won't start anymore even after removal. I have the feeling that we are only discovering little by little what makes IcingaDB crash. This is not the first time for me anyway.

Output of systemctl status icingadb -l

[root@mgtmon202:~] systemctl status icingadb -l
● icingadb.service - Icinga DB
   Loaded: loaded (/usr/lib/systemd/system/icingadb.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2023-10-02 13:06:16 CEST; 1 day 18h ago
 Main PID: 68477 (code=exited, status=1/FAILURE)

Oct 02 13:06:16 mgtmon202 systemd[1]: Starting Icinga DB...
Oct 02 13:06:16 mgtmon202 icingadb[68477]: Starting Icinga DB
Oct 02 13:06:16 mgtmon202 systemd[1]: Started Icinga DB.
Oct 02 13:06:16 mgtmon202 icingadb[68477]: Connecting to database at 'VDBTICG003.a41mgt.local:6446'
Oct 02 13:06:16 mgtmon202 icingadb[68477]: Connecting to Redis at 'localhost:6380'
Oct 02 13:06:16 mgtmon202 icingadb[68477]: Starting history sync
Oct 02 13:06:16 mgtmon202 icingadb[68477]: strconv.ParseUint: parsing "256": value out of range
                                           can't parse check_attempt into the uint8 StateHistory#CheckAttempt: 256
                                           github.com/icinga/icingadb/pkg/structify.structifyMapByTree
                                                   github.com/icinga/icingadb/pkg/structify/structify.go:97
                                           github.com/icinga/icingadb/pkg/structify.MakeMapStructifier.func1
                                                   github.com/icinga/icingadb/pkg/structify/structify.go:42
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeOneEntityStage.func1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:182
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeMultiEntityStage.func1.1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:219
                                           golang.org/x/sync/errgroup.(*Group).Go.func1
                                                   golang.org/x/[email protected]/errgroup/errgroup.go:57
                                           runtime.goexit
                                                   runtime/asm_amd64.s:1571
                                           can't structify map map[string]interface {}{"check_attempt":"256", "check_source":"mgtmon204.t1r.afomcs.com", "endpoint_id":"6a097eb4476ba547053aff5ddab0f13b01c14359", "environment_id":"a833360d2117b56208017d792da840cab1261c25", "event_id":"937c4ea4c7b4d0be5f104b6ab1858470a6776f16", "event_time":"1696206177615", "event_type":"state_change", "hard_state":"0", "host_id":"2da752d7bb8530fda689f2d0b40f96556755986e", "id":"8e04d1c6c4b9d852e7a7849dbd8861f6fdff1bbb", "max_check_attempts":"1000", "object_type":"service", "output":"PROCS CRITICAL: 0 processes with args '/usr/sap/FPP/J19/exe/sapjvm_8/jre/bin/java -jar mlrclient.jar' ", "previous_hard_state":"0", "previous_soft_state":"0", "scheduling_source":"mgtmon204.t1r.afomcs.com", "service_id":"a7f19300ec4a236e78cdc58131c0e163e26910ed", "soft_state":"2", "state_type":"0"} by tree []structify.structBranch{structify.structBranch{field:0, leaf:"", subTree:[]structify.structBranch{structify.structBranch{field:0, leaf:"", subTree:[]structify.structBranch{structify.structBranch{field:0, leaf:"", subTree:[]structify.structBranch{structify.structBranch{field:0, leaf:"id", subTree:[]structify.structBranch(nil)}}}}}}}, structify.structBranch{field:1, leaf:"", subTree:[]structify.structBranch{structify.structBranch{field:0, leaf:"environment_id", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:1, leaf:"endpoint_id", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:2, leaf:"object_type", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:3, leaf:"host_id", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:4, leaf:"service_id", subTree:[]structify.structBranch(nil)}}}, structify.structBranch{field:2, leaf:"event_time", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:3, leaf:"state_type", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:4, leaf:"soft_state", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:5, leaf:"hard_state", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:6, leaf:"previous_soft_state", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:7, leaf:"previous_hard_state", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:8, leaf:"check_attempt", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:9, leaf:"output", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:10, leaf:"long_output", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:11, leaf:"max_check_attempts", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:12, leaf:"check_source", subTree:[]structify.structBranch(nil)}, structify.structBranch{field:13, leaf:"scheduling_source", subTree:[]structify.structBranch(nil)}}
                                           github.com/icinga/icingadb/pkg/structify.MakeMapStructifier.func1
                                                   github.com/icinga/icingadb/pkg/structify/structify.go:42
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeOneEntityStage.func1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:182
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeMultiEntityStage.func1.1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:219
                                           golang.org/x/sync/errgroup.(*Group).Go.func1
                                                   golang.org/x/[email protected]/errgroup/errgroup.go:57
                                           runtime.goexit
                                                   runtime/asm_amd64.s:1571
                                           can't structify values map[string]interface {}{"check_attempt":"256", "check_source":"mgtmon204.t1r.afomcs.com", "endpoint_id":"6a097eb4476ba547053aff5ddab0f13b01c14359", "environment_id":"a833360d2117b56208017d792da840cab1261c25", "event_id":"937c4ea4c7b4d0be5f104b6ab1858470a6776f16", "event_time":"1696206177615", "event_type":"state_change", "hard_state":"0", "host_id":"2da752d7bb8530fda689f2d0b40f96556755986e", "id":"8e04d1c6c4b9d852e7a7849dbd8861f6fdff1bbb", "max_check_attempts":"1000", "object_type":"service", "output":"PROCS CRITICAL: 0 processes with args '/usr/sap/FPP/J19/exe/sapjvm_8/jre/bin/java -jar mlrclient.jar' ", "previous_hard_state":"0", "previous_soft_state":"0", "scheduling_source":"mgtmon204.t1r.afomcs.com", "service_id":"a7f19300ec4a236e78cdc58131c0e163e26910ed", "soft_state":"2", "state_type":"0"}
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeOneEntityStage.func1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:184
                                           github.com/icinga/icingadb/pkg/icingadb/history.writeMultiEntityStage.func1.1
                                                   github.com/icinga/icingadb/pkg/icingadb/history/sync.go:219
                                           golang.org/x/sync/errgroup.(*Group).Go.func1
                                                   golang.org/x/[email protected]/errgroup/errgroup.go:57
                                           runtime.goexit
                                                   runtime/asm_amd64.s:1571
Oct 02 13:06:16 mgtmon202 systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Oct 02 13:06:16 mgtmon202 systemd[1]: icingadb.service: Unit entered failed state.
Oct 02 13:06:16 mgtmon202 systemd[1]: icingadb.service: Failed with result 'exit-code'.
@Al2Klimov Al2Klimov transferred this issue from Icinga/icinga2 Oct 10, 2023
@Al2Klimov Al2Klimov self-assigned this Oct 10, 2023
@carcanye
Copy link

Hi, i had the exact same problem. Tried even deleting the database, commenting the config on icinga with the responsible check and same problem appears when starting the service.

Finally, a flushall on redis solved the problem for me and could start the service.

@A41susan
Copy link
Author

A41susan commented Dec 13, 2023

Thanks, you just saved me from deleting everything and starting from scratch!
Here's what I did:
nc -v 127.0.0.1 <port>
AUTH <password>
flushall

@lippserd lippserd reopened this Dec 13, 2023
@bobapple
Copy link
Member

bobapple commented Jan 4, 2024

ref/IP/48956

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants