-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maint: adding proper smoke test(s) #310
Conversation
echoserver is separate for now bc data flows without it (at least local)
debug is for pyroscope so removed for now not sure we need it here
I've made a few changes, and can get telemetry to the collector and the file if I set the endpoint based on the IP address of the collector pod, but haven't yet figured out how to set it based on the name. The latest change uses the helm chart again with a local image - that's currently commented out in Makefile but should probably be uncommented when this is ready to go so it builds in CI. After running I also temporarily moved the |
For some reason, the agent pod won't resolve DNS name for the collector k8s service. (Maybe because it's a daemonset pod on the host network? Different DNS available to it?) So we need to look up the collector service's IP and use that in the agent's Helm install values. That's done in a Make function get_collector_ip that gets called inline to set an env var for envsubst. BONUS! Conditionally build the docker image if it doesn't exist when smoke testing. This necessitated breaking up the steps of smoke into separate targets that all get set as prereqs of smoke. Co-authored-by: Jamie Danielson <[email protected]>
Use smoke job with echo for more precise test. Current output also includes all the live/ready probes in k8s which can be difficult to pin down. Using the echoserver command with a specific 404 code makes it easier to pinpoint a span. Smoke target now includes smoke job, copying of collector log output, and running the bats test. Gitignore the copy of the collector log output.
WIP to limlit the test to only the echoserver 404 span. All scopes have the same hny-network-agent name so it is difficult to target only a specific resourceSpan for this request. For now there is a traces-temp-test.json file being used that contains only the resource span for the echo server (copied from the log output). Tests check for base requirements like span ID and http attributes, but also includes custom stuff like resource attributes and headers.
Current state as of last push: running ✓ Agent includes service.name in resource attributes 12 tests, 0 failures |
* include only echoserver namespace. * exclude service to limit to 1 span for test (pod)
The smoke test builds the docker image, so remove the extra build step. Also, we don't need to use QEMU here bc we're not building both arches. Finally, docker-build was a bit confusing as a job name, especially when looking at the workflow screen. That is now called build-and-test.
Note: I renamed the PR workflow job from docker-build to build-and-test, which seems to result in this "expected" status check that will never resolve. If that causes us problems I can revert that name change. |
I'm not against. We'll need to update the branch protection rules for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So because Robb opened this PR he can't approve it. I can approve it but uhh I'm a little biased as I picked it up and did the later work on it 😅 . I'm approving for the green checks - feel free to merge if y'all are good with it or let me know further feedback to be addressed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've all worked on this, so we can collectively approve & merge. Good work @robbkidd and @JamieDanielson 🎉
Which problem is this PR solving?
Description of the changes
Set up a smoke test, details included in the new make targets below:
smokey_cluster_create
: Create kind clustersmokey_collector_install
: Install collector via helm with new collector values, including a filter processor that only keeps echoserver spans from pods with a response code of 405smokey_agent_install
: Install agent via helm with new agent values, including usage of a local image (installed if not available in the shell) and export to the collectorsmokey_echo_job
: Install echoserver and run a job that sends a POST request expecting a 405 responsesmokey_copy_output
: After a short wait, copy output from the newly created file saved with telemetry from collectorsmokey_verify_output
: Make assertions via the newverify.bats
bats test (similar to previous tests on other distros and in go auto-instrumentation) that the output contains the expected span and attributessmoke
: Do all of the things aboveunsmoke
: Tear it all down when doneAlso updated PR, main, and release workflows to run the new smoke test.
How to verify that this has the expected result
make smoke
and get a bunch of passing tests - also check out the (git ignored) json file that contains the remaining span that's being tested.make unsmoke
to spin it down.