Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for collecting DHCP metrics from Kea Control Agent #2937

Draft
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

jorund1
Copy link
Collaborator

@jorund1 jorund1 commented Jul 1, 2024

Implements #2931

Uses python 3.9 typehints

                      HTTP                       IPC
KeaDhcpMetricSource <------> Kea Control Agent <=====> Kea DHCP4 server / Kea DHCP6 server

Defines the KeaDhcpMetricSource class and its superclass DhcpMetricSource, having methods fetch_metrics and fetch_metrics_to_graphite that can be used to fetch metrics from a Kea DHCP server controlled by a Kea Control Agent and send these metrics to a graphite server. Example usage:

from nav.dhcp.kea_metrics import KeaDhcpMetricSource
import time

KEA_CTRL_AGENT_ADDR = "2001:db8::1"
KEA_CTRL_AGENT_PORT = 443

GRAPHITE_ADDR = "2001:db8::2"
GRAPHITE_PORT = 2003

# Collects metrics from the Kea DHCP4 server that the Kea Control
# Agent is configured to control
source = KeaDhcpMetricSource(
    address=KEA_CTRL_AGENT_ADDR,
    port=KEA_CTRL_AGENT_PORT,
    dhcp_version=4
)

while True:
    source.fetch_metrics_to_graphite(
        address=GRAPHITE_ADDR,
        port=GRAPHITE_PORT
    )
    time.sleep(600)

The other defined classes and functions are helpers and is not really meant to be used by other parts of NAV.

Todo:

  •  Fix all TODO code comments.
  •  Add support in KeaDhcpConfig.from_json for storing the configuration hash supplied in the json obtained from config-get queries.
  • Full test coverage.
    • Test handling of config-hash-get queries in KeaDhcpMetricSource.fetch_and_set_dhcp_config and KeaDhcpMetricSource.fetch_dhcp_config_hash
    • Test handling of statistic-get queries in KeaDhcpMetricSource.fetch_metrics
    • Add more Kea DHCP server configuration example strings to test against, including DHCP6 configuration example strings.

@CLAassistant
Copy link

CLAassistant commented Jul 1, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.65%. Comparing base (8d039e0) to head (d3faa81).
Report is 16 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2937      +/-   ##
==========================================
+ Coverage   56.58%   56.65%   +0.07%     
==========================================
  Files         602      604       +2     
  Lines       43729    43890     +161     
  Branches       48       48              
==========================================
+ Hits        24744    24868     +124     
- Misses      18973    19010      +37     
  Partials       12       12              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lunkwill42
Copy link
Member

The tests are failing on Python 3.7. I just merged #2901 to get rid of Python 3.7 from the default test matrix, since NAV 5.10 is out and 5.11 will drop support for Python 3.7.

You may have to rebase this on the latest master branch commit to get the tests working on Github, @jorund1

@jorund1 jorund1 force-pushed the kea-ctrl-agent-metrics branch from d157f0b to 0640c0c Compare July 2, 2024 11:34
Copy link
Member

@lunkwill42 lunkwill42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jorund1 !

I have not read and understood the code in full detail, but I have some generic feedback.

Since this is your first contribution, I like to mention that I am a proponent of the "step-down rule". Full quote:

We want the code to read like a top-down narrative. We want every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions. I call this The Stepdown Rule.

To say this differently, we want to be able to read the program as though it were a set of TO paragraphs, each of which is describing the current level of abstraction and referencing subsequent TO paragraphs at the next level down.

To include the setups and teardowns, we include setups, then we include the test page content, and then we include the teardowns.

To include the setups, we include the suite setup if this is a suite, then we include the regular setup.

To include the suite setup, we search the parent hierarchy for the "SuiteSetUp" page and add an include statement with the path of that page.

To search the parent...

It turns out to be very difficult for programmers to learn to follow this rule and write functions that stay at a single level of abstraction. But learning this trick is also very important. It is the key to keeping functions short and making sure they do "one thing." Making the code read like a top-down set of TO paragraphs is an effective technique for keeping the abstraction level consistent.

In general, in kea_metrics_test.py, I would like to see the actual tests first in the code file, fixtures and helpers further towards the bottom. The only exception is when the programming language makes this ordering impossible, which can happend in Python, since it's not a pre-compiled language.

Other than that, I see you have written some code in order to validate JSON config data from Kea. At this point, we might consider pulling in Pydantic into NAV also, I think a lot of this code would disappear with it. We use Pydantic > 2 for validation of JSON data in several of the other projects our team maintains. It's not core to what you're really working on, but I at least urge you to have look at the library :)

python/nav/dhcp/generic_metrics.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Outdated Show resolved Hide resolved
@lunkwill42 lunkwill42 requested a review from stveit July 2, 2024 13:15
python/nav/dhcp/generic_metrics.py Outdated Show resolved Hide resolved
python/nav/dhcp/kea_metrics.py Outdated Show resolved Hide resolved
python/nav/dhcp/kea_metrics.py Outdated Show resolved Hide resolved
python/nav/dhcp/kea_metrics.py Outdated Show resolved Hide resolved
tests/unittests/dhcp/kea_metrics_test.py Show resolved Hide resolved
@jorund1 jorund1 force-pushed the kea-ctrl-agent-metrics branch 2 times, most recently from 4c8f756 to 78607bf Compare July 30, 2024 10:20
Why:
  The main metrics one wishes to obtain from a dhcp server of a particular
  type (Kea DHCP, ISC DHCP, udhcpd, etc.) are the same accross the
  board, and thus the methods that process these metrics (e.g. sending
  them to a graphite server, creating a canonical graphite path for a
  specific type of metric, etc) are better off being defined once in a
  superclass.
KeaDhcpMetricSource is an implementation of DhcpMetricSource which
collects the four metrics defined in the DhcpMetricKeys enum (number
of total, used, free and touched addresses) for each vlan of a Kea
DHCP server.
Why:
  I'll need to check more carefully how to obtain the amount of free
  addresses in a subnet, it proboably must be calculated since Kea
  doesn't seem to supply it.
Need to see how to deal with shared networks that might also be
defined as well. Should be exactly the same as for subnets, since a
shared network really is just a uniquely named list of subnets.
why:
    This is build-up for an up-coming commit that implements caching
    of self.kea_dhcp_config in KeaDhcpMetricSource; everytime we fetch
    a new kea_dhcp_config with fetch_dhcp_config(), we would like to
    store it as well - hence the name change of the function.

    The up-coming commit will then make use of Kea Control Agent's
    `config-hash-get` command (included in Kea versions >= 2.4.0) to
    check if we need to update the cached config or not whenever
    set_and_fetch_dhcp_config() is called.
What:
    In addition to updating function names etc., I've also updated the
    mocking of the requests.post requests in the test script so that
    we can give different respsonses based on the Kea Control Agent
    command the kea_dhcp_data script sends to the Kea Control Agent
    server with its request.post calls
For now, I'll just call this new function for unwrap(), since all of
the uses of send_query() expects a list of one response and the first
thing that is done is always to unwrap the singleton response
list. unwrap() could be made to have generic typing, but for
simplicity it uses KeaResponse instead of a generic TypeVar('T') for
now.
Also fixes a typo on line 342 where an extra comma had sneaked
itself into the code
What:
    Before, we only included subnets defined in the subnet[4,6]
    section of the Kea DHCP config obtained through the "config-get"
    query in the KeaDhcpConfig.subnets list. Now we also include the
    subnets defined in the shared-networks section of the Kea DHCP
    config, and thus we include all subnets than could possibly be
    configured for a Kea DHCP server, which means that we can now
    fetch metrics from all defined subnets.
Why:
    Datetime instances are the most precise, and can easily be
    converted to unix timestamps is need be. Datetime instances also
    makes it easy to work with timezone differences, which we sadly
    seem to have to care about since the Kea Control Agent doesn't
    provide timezone data along with its timestamps.
Why:
    If we don't obtain the config, we do not have enough information
    to start fetching subnet metrics; in this case there is no way to move forward
    and there's no hope of obtaining some useful data by continuing
    the call.
what:
    prior to this commit, the key for a specific metric (i.e. the name
    of that metric used by NAV) had the same naming convention as
    dhcpd-pools(1), e.g. "cur" was the name used for the "amount of
    addresses currently assigned to dhcp clients on this subnet" and
    "max" was the name used for the "total amount of addresses
    controlled by this subnet". DhcpMetricKey.CUR and
    DhcpMetricKey.MAX, however, is not very descriptive, so I changed
    the key names to be DhcpMetricKey.TOTAL and
    DhcpMetricKey.ASSIGNED. DhcpMetricKey.TOUCH was removed all
    together because it seems to me like this is not a common metric
    to be reported by dhcp servers (dhcpd-pools(1) uses "touch" to
    mean the number of assigned addresses that has timed out but that
    is not yet marked as re-assignable by the dhcp-server).
what:
    Modify the docstrings so that they all follow the same pattern;
    i.e. they all begin by describing what they return.
why:
    In the future, one might want to include sensitive information,
    such as passwords or tokens in requests, and a response from Kea
    might contain secrets, especially with regards to "config-get"
    responses, where a config might contain passwords.
what:
    before this change, our mocking functionality only allowed for
    setting the string of the response object that is returned by
    requests.post() and requests.Session().post(), by using

    responsequeue.add("<command-name>", "<returned-string>").
    responsequeue.autofill("dhcp<4 or 6>", config_to_return, statistics_to_return)

    This commit modifies adds an extra parameter to both of these
    functions, the `attrs` parameter:

    responsequeue.add("<command-name>", "<returned-string>", attrs={}).
    responsequeue.autofill("dhcp<4 or 6>", config_to_return, statistics_to_return, attrs={})

    if `attrs` = {"myattr": "myval"}, then the requests.Response()
    object returned by any call to requests.post() or
    requests.Session().post() will have the attribute "myattr":

    attrs = {"myattr", "myval"}
    responsequeue.add("<command-name>", "<returned-string>", attrs)
    response = requests.post(...)
    assert response.myattr == "myval"
(this is what the graphite exporting function expects)
@jorund1 jorund1 force-pushed the kea-ctrl-agent-metrics branch from 78607bf to 76da903 Compare September 20, 2024 08:46
@lunkwill42
Copy link
Member

It's been way too long since I looked at this, but I surmise that it really just adds a generic framework for fetching DHCP metrics, with a Kea-specific implementation.

There seems to be no way to drive this new functionality (i.e. a command line program), and no documentation on how to use it. Those things would be of utmost importance for this to make it into a NAV release.

I think I would rather have a live code-review of the whole thing so I can understand it better, and since you're going on vacation today, I don't think we'll be able to commit to that until January at the earliest.

This means this will miss the deadline for the 5.12 release, but we'll likely make 5.13 by March next year, which will be it's next chance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants