Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ops fixes #3616

Merged
merged 2 commits into from
Apr 12, 2024
Merged

Ops fixes #3616

merged 2 commits into from
Apr 12, 2024

Conversation

dbutenhof
Copy link
Member

Correct accounting and reporting of cache unpacking statistics.

Avoid reflecting SQLAlchemy init_db errors into conmon.

Cache report:
  25,702 datasets currently unpacked, consuming 85.1 TB
  5,492 datasets have unpack metrics and 2 have been unpacked more than once.
  The most unpacked dataset has been unpacked 2 times, uperf__2024.04.10T23.35.29
  The least recently used cache was referenced Jan 22, fio__2020.03.09T12.19.24
  The most recently used cache was referenced today, uperf__2024.04.10T23.35.29
  The smallest cache is 4.1 kB, fio__2020.03.03T21.51.55
  The biggest cache is 420.8 GB, uperf_rhel8.5_4.10.0.310.el8_intel_10gb_long_run_2021.06.08T17.54.04
  The worst compression ratio is -4201.758%, fio__2020.03.03T21.51.55
  The best compression ratio is 99.800%, fio__2020.02.27T14.03.02
  The fastest cache unpack is 0.025 seconds, pbench-user-benchmark__2020.05.01T09.13.21
  The slowest cache unpack is 66527.955 seconds, uperf_ethanolx_nps1_hton_biosdefault_tp_1x100GbE_CX5_8.1_147_2020.01.14T15.53.38
  The fastest cache unpack streaming rate is 538.635 Mb/second, pbench-user-benchmark_0400r-020cpt-0000d_ms-100ka-ytlsru-120s-mix-aws-perfci_2020.10.14T06.00.43
  The slowest cache unpack streaming rate is 0.287 Mb/second, pbench-user-benchmark__2020.04.29T10.13.07
  622 datasets have no unpacked size, 0 are missing reference timestamps, 0 have bad size metadata

Correct accounting and reporting of cache unpacking statistics.

Avoid reflecting SQLAlchemy `init_db` errors into conmon.
@dbutenhof dbutenhof added Server Operations Related to operation and monitoring of a service labels Apr 11, 2024
@dbutenhof dbutenhof requested a review from webbnh April 11, 2024 16:48
@dbutenhof dbutenhof self-assigned this Apr 11, 2024
webbnh
webbnh previously approved these changes Apr 11, 2024
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

I realized that even though I'd shortened the traceback, it would still be
displayed by Python on termination with an unhandled exception, and that would
still be relayed through conmon one line at a time. To avoid that, I added an
`except Exception` clause to print a simple message and exit.

In the process, I realized that while I'd written a new Click-based
`pbench-reindex`, the old `/opt/pbench-server/bin/pbench-reindex` script was
still there, and referenced in several config files. I removed these.
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still great.

@dbutenhof dbutenhof merged commit 37be815 into distributed-system-analysis:main Apr 12, 2024
4 checks passed
@dbutenhof dbutenhof deleted the cleanup branch April 12, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Operations Related to operation and monitoring of a service Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants