Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data too long for service_state.check_commandline #860

Open
mrdsam opened this issue Nov 20, 2024 · 4 comments
Open

Data too long for service_state.check_commandline #860

mrdsam opened this issue Nov 20, 2024 · 4 comments
Labels
area/schema bug Something isn't working crash
Milestone

Comments

@mrdsam
Copy link

mrdsam commented Nov 20, 2024

Describe the bug

After upgrading to icingadb-1.2.0_2 (from packages, as usual), icingadb frequently crashes (every 10 to 30 minutes). The error message is always the same, on both nodes (except for "<n>", which is a always a different number)

Error 1406 (22001): Data too long for column 'check_commandline' at row <n>

See attached log (from start until crash) for details!
start-end-1.txt

To Reproduce

  • apply schema upgrade mysql/upgrades/1.2.0.sql
  • upgrading to icingadb-1.2.0_2 on both systems
  • icinadb restart on both systems

After that, the problems started. Anyway, I proceeded with

  • icingadb stop (both systems)
  • apply schema upgrade ysql/upgrades/optional/1.2.0-history.sql (took
    a while, but worked fine)
  • icingadb start (both systems)

The problmes didn't get away after the optional schema upgrade, as expected.

I then restarted everything: icinga2, redis, icingadb, the MariaDB-Cluster (rolling method, node by node). Problem persists.

Expected behavior

Should not crash with fatal error.

Your Environment

Icinga2 HA-Cluster (with many satellites) on FreeBSD 13.3, with redis and icingadb on each cluster node.

Database is a 3-Node MariaDB Galera-Cluster.

  • Icinga DB version: 1.2.0
  • Icinga 2 version: r2.14.3-1
  • Operating System and version: 13.3-RELEASE-p7
  • MariaDB: 10.11.9 / Galera 26.4.16

Additional context

Workaround

Here is what I did as a workaround, so I didn't have to rollback the
database to it's old state:

  • Uninstalled icingadb-1.2.0_2
  • Installed icingadb-1.1.1_14 again
  • mysql: delete from icingadb_schema where version=5;
  • service icingadb restart

Since then, everything looks fine.

No problem with postgresql?!

Interesting fact: I also have icingadb 1.2.0 running on some of the satellites; the problem does NOT occur there! Only difference is that I use postgres as database backend there and they are single instance each.

@oxzi
Copy link
Member

oxzi commented Nov 20, 2024

Thanks for posting this issue.

One related change in 1.2.0 was enabling strict mode for MySQL in #699. Now, MySQL will fail for data being too big instead of silently truncating it. A similar issue was solved in #792, addressing other columns.

However, based on your log, the data for service_state.check_commandline is too big. This column is of type text, allowing 64K characters.

Could you please check your configuration if there are such huge values for some command line?

@oxzi oxzi changed the title Icingadb crashes every ~ 10mins after upgrade to 1.2.0 Data too long for service_state.check_commandline Nov 20, 2024
@mrdsam
Copy link
Author

mrdsam commented Nov 20, 2024

Thank you for your valuable input. Indeed "check_interfaces", which uses the performance data of the last check result to build it's new command line, exceeds this limit on hosts with many interfaces. I removed the checks and now I get...

+-------------------------------------+
| max(char_length(check_commandline)) |
+-------------------------------------+
|                               28213 |
+-------------------------------------+

...what should be safe for now. I'll try 1.2.0 again ASAP and will report the outcome here.

Nevertheless, wouldn't it be nicer to check / truncate the string before trying to write it to the database and issue a warning instead?

@oxzi
Copy link
Member

oxzi commented Nov 21, 2024

Indeed "check_interfaces", which uses the performance data of the last check result to build it's new command line, exceeds this limit on hosts with many interfaces.

That's quite interesting, as having a total length of arguments exceeding 64K characters is a bit unexpected. Speaking of expectations, that's where the limit most likely came from, as it was there since the beginning and Icinga 2's IDO used the same type.

However, longer arguments are technically possible, as nicely outlined on this webpage which I have referred before. For example, on a Linux I get an ARG_MAX of 2,097,152 and on an OpenBSD there is 524,288.

Now the question is how likely it is to have such huge values, exceeding the TEXT type capacity. As seen in this very issue, it may happen.

However, to fit both ARG_MAX listed above, the type must be at least MEDIUMTEXT. Based on this table from the MySQL documentation, the required storage is based on the actual length of the string plus 2 or 3 bytes for TEXT or MEDIUMTEXT, respectively.

Raising the type should not be too expensive, but nonetheless I would like to have another opinion on that.

Nevertheless, wouldn't it be nicer to check / truncate the string before trying to write it to the database and issue a warning instead?

Definitely something should be done. Unless the type will be increased to MEDIUMTEXT, I would vote for raising a warning as truncating data is dangerous, imo.

@mrdsam
Copy link
Author

mrdsam commented Nov 21, 2024

First, icingadb 1.2.0 is running without problems now, thanks @oxzi for the information about strict mode and the hint that my commands could be too long!

I would not take my case as an argument to increase the size of check_commandline, because monitoring all ports of a 48 port switch without filters or splitting them up is not good practice in the first place.

My most important concern is that icingadb-daemon should keep running in such cases. IMO the best solution would be when icingadb check raises a warn or crit. So things keep running but you get notified that there's something wrong.

@oxzi oxzi added bug Something isn't working crash area/schema labels Nov 29, 2024
@oxzi oxzi added this to the 1.3.0 milestone Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema bug Something isn't working crash
Projects
None yet
Development

No branches or pull requests

2 participants