Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ring: Members of the ring stay unhealthy after leaving the ring. #117

Open
kavirajk opened this issue Jan 23, 2022 · 11 comments
Open

ring: Members of the ring stay unhealthy after leaving the ring. #117

kavirajk opened this issue Jan 23, 2022 · 11 comments

Comments

@kavirajk
Copy link
Contributor

kavirajk commented Jan 23, 2022

Problem

Recently we started seeing lots of "unhealthy" instances on the distributor ring in Loki clusters during normal rollout. These "unhealthy" members are the ones who already left the ring but they failed to unregister themselves from the underlying KV store (consul in this case).

Did some investigation and would like record the behavior here.

Investigation

So what happens when a member of the ring leaves?

It needs to unregister() itself from the underlying KV store. And this unregister basically means removing its instance-id from the shared key value called collectors/distributor. (or any other key based on the components)

This key value collector/distributor is been shared among all the members of the ring. And every member needs to update it before successfully leaving the ring.

This shared access to the collectors/distributor is provided by something called CAS access. It provides some sort of synchronization mechanism for the key value shared by multiple members of the ring.

The way CAS (Check-And-Set) works is based on current ModifyIndex, a metadata about the key value you are tyring to set(or update).

This index is monotonically increasing counter that get updated every time any member update this particular kv.

This CAS operation is similar to PUT operation except, it takes additional param called cas, an integer representing ModifyIndex of the key value you are trying to update.

Two cases.

  1. if cas is set to 0 (or leave it empty) then the kv is updated only if it doesn't exist before.
  2. but if cas is set to non-zero value, then the kv is updated ony if that value of the cas matches with current ModifyIndex of the kv.

e.g; (I ran it in our clusters)

curl -s http://127.0.0.1:8500/v1/kv/collectors/distributor | jq .[0].ModifyIndex
8149486

Here modifyindex is 8149486. If I try to cas with lesser value, then it returns false.

curl -s --request PUT --data 10 http://127.0.0.1:8500/v1/kv/collectors/distributor?cas=100 && echo
false

This is what happening in our distributor ring. Some "LEAVE"ing distributor, try max of 10 times to set the value with ModifyIndex that they got previously. But meanwhile some other distributor would have updated it, making that ModifyIndex out of date. Thus making these distributors ran out of retries.

Proposal

The ideal solution is bit tricky. As we need some kind of synchronisation with same key:value collectors/distributors shared by all the members of the ring. which is what ModifyIndex gives us anyway.

So tweaking the MaxCasRetry (and/or CasRetryDelay) configs would be good one IMHO. Unfortunately, dskit currently doesn't accept to tweak this retry arguments. But I think we can make it configurable. And setting retry to same as number of replicas always resulted in no unhealthy members (tested it with even 100 distributor replicas)

So I think it's better

  1. Make the flags configurable
  2. Document this behavior in the ring and kv packages.

NOTES

  1. In case of cas returning false (because of unmatched ModifyIndex), it will be treated as non-error case.
    e.g this handled in the dskit code here

  2. The default retry of 10 times may work for ingester (it has expensive and time-consuming gracefull shutdown process). So most likely, one of the CAS retry would succeed. But not the case for distributor, it can shutdown almost instantly and thus high chance lots of distributors leaving the ring, accessing the shared key value at the same time.

  3. While investigating, I also found the error returned by unregister operation will be lost if ring is wrapped in BasicService and the service doesn't have any listeners.

  4. CAS doc from consul website.

Any thoughts, suggestions or criticisms are welcome!

@cyriltovena
Copy link
Contributor

I think the retries and delay should be configurable but I would hope for better solution, although I don't have an idea.

/cc @pstibrany ?

@bboreham
Copy link
Contributor

Does having a lot of pods shut down at the same time not cause other problems, like difficulty updating routing for the Service?

How many are we talking, in what time?

How come the replacement pods can insert themselves in the ring and become ready at the same rate?

@kavirajk
Copy link
Contributor Author

Does having a lot of pods shut down at the same time not cause other problems, like difficulty updating routing for the Service?

No, not really (AFAIS). Only problem we saw this retry loop completed without being successful.

How many are we talking, in what time?

We saw 22 unhealthy members in cluster with 50 distributor replicas. Happens during normal rollout (no OOM or any unexpected crashloopback)

How come the replacement pods can insert themselves in the ring and become ready at the same rate?

good point. Not sure. will check how member register themselves and come back to you.

@bboreham
Copy link
Contributor

Do you have minReadySeconds set on the manifest?

@kavirajk
Copy link
Contributor Author

How come the replacement pods can insert themselves in the ring and become ready at the same rate?

I think that's because we have initRing called in the main loop of the ring itself? (so say even if you remove entries from underlying consul or consul restarted, these members try register during the loop cycle)

But not the same case with unregister().

@bboreham
Copy link
Contributor

If the program doesn’t become ready until registered, and if you have minReadySeconds set, then the rollout will slow down until the additions do succeed.

I’m still interested to know in what time those 22 pods failed to update. Or perhaps: what is the overall rate of CAS operations during the rollout?

@cyriltovena
Copy link
Contributor

I’m still interested to know in what time those 22 pods failed to update. Or perhaps: what is the overall rate of CAS operations during the rollout?

@kavirajk can you share some metrics on that.

If the program doesn’t become ready until registered, and if you have minReadySeconds set, then the rollout will slow down until the additions do succeed.

Yeah I think tweaking the deployment could be useful to slow this down and avoid it. We don't need it to go super fast.
Can we try this too.

@kavirajk
Copy link
Contributor Author

I guess different tradeoff here. Tried following cases to get some metrics about CAS rate.

  1. In normal rollout, the CAS operations rate reaches almost 700 reqs per sec. (with 100 replicas).

ZqUuCy4

Used MaxSurge and MaxUnAvailable as 25%.

This resulted in around 10 uhealthy members. Rollout is complete under 2 mins.

  1. With following k8 configs
    MaxSurge: 5
    MaxUnavailable: 1

The CAS rate is much lower (30 reqs per sec)
Screenshot_2022-01-24_14-16-37

The problem is rollout is very slow it took more than 10 minutes for the rollout to complete. (But no unhealthy members)

@kavirajk
Copy link
Contributor Author

I guess we can play with these k8 configs to reach good default values for distributors, but looks like we always have to pay some rollout time as a cost.

@kavirajk
Copy link
Contributor Author

Here is a another one.
Keep k8 configs same (maxSurge and maxUnavailable to 25%)
But just increase the retry from 10 -> 100.

Rollout tooks same ~2mins, No Unhealthy. But CAS rate increased almost 2x (1500 reqs per sec) at peak.

Screenshot_2022-01-24_14-31-17

@kavirajk
Copy link
Contributor Author

On the one side we want no unhealthy members (easy of operation)
Other side we want not to overload consul with high CAS rate
Also we need to be careful about Rollout time.

These are the contradicting goal, and we need to find some sweet spot with some sane config values in k8 and CAS retries.

Finally I will also see how having backoff + jitter can affect the CAS retries (currently our consul KV client doesn't support any backoff)

kavirajk added a commit to grafana/loki that referenced this issue Jan 24, 2022
This fixes couple of issues.
1. By default these configs are 25% in k8, meaning during rollout 25% of pods are allowed to shutdown immediately.
2. Due to (1), during graceful shutdown process, 25% of all the pods access consul to `unregister()` from shared key value.

(2) makes CAS rate of underlying KV store high (leads to lots of retry and failing) sometimes failing to unregister leaving the ring "unhealthy"

Also this PR make these configs consistent across all k8 workloads.

More details: grafana/dskit#117
kavirajk added a commit to grafana/loki that referenced this issue Jan 25, 2022
#5227)

* Add `MaxSurge` and `MaxUnavailable` strategy to all loki k8 workloads.

This fixes couple of issues.
1. By default these configs are 25% in k8, meaning during rollout 25% of pods are allowed to shutdown immediately.
2. Due to (1), during graceful shutdown process, 25% of all the pods access consul to `unregister()` from shared key value.

(2) makes CAS rate of underlying KV store high (leads to lots of retry and failing) sometimes failing to unregister leaving the ring "unhealthy"

Also this PR make these configs consistent across all k8 workloads.

More details: grafana/dskit#117

* Remove it from statefulset workloads

Signed-off-by: Kaviraj <[email protected]>
charleskorn added a commit that referenced this issue Aug 4, 2023
* Exempt 503 (Service Unavailable) from error logging (#97)

Part of cortexproject/cortex#810

* Add gRPC streaming middleware. (#95)

* Add gRPC streaming middleware.

Adds a series of gRPC stream middleware for logging, user header propagation and tracing. This is laying groundwork for using gRPC streaming between ingesters and queriers in Cortex.

* Update the vendored versions.
* Pin kuberesolver because the API has changed.

* Always put code under github.com/weaveworks/common;

Dependancies between packages won't work if the code is checked out in github.com/tomwilkie, for instance.  Also, common is a very common name for a repo, so its not uncommon for the repo to be called something like weaveworks-common.

Signed-off-by: Tom Wilkie <[email protected]>

* When trace reporting is enabled, don't enable trace sampling unless asked (#102)

* Don't enable trace sampling by default

* Don't export new unused member, simplify docstring

* Abstract logging for the middleware (#96)

* Move dedupe code to dedupe.go

* Add unified logging interface.

* Adapt middleware/, user/, signals/ and server/ to new logging interfaces.

Allow user to specify logging implementation in server, if non is specified default to logrus.
As part of this we need log level as server config, and for convenience we expose the server logger.

* Update vendoring

* Expose the various level types.

Signed-off-by: Tom Wilkie <[email protected]>

* Don't shutdown the gRPC server the minute a signal is received. (#99)

gRPC behaviour should be the same as HTTP: Run() exits when a signal is received, and Stop() stops the handling of requests.

And as we no longer call GracefulShutdown in Run, call it in Stop - one should always try to shutdown gracefully.

Signed-off-by: Tom Wilkie <[email protected]>

* Add user and org ID to span if known (#100)

* Propagate tracing context over httpgrpc (#105)

* Propagate tracing context over httpgrpc

* Log warnings for failed trace inject/extract

* Don't use full URL as metric label on unmatched path (#108)

Random people on the Internet can send in arbitrary paths, so collapse
them all into "other" when reporting metrics.

* Raise default server timeouts from 5s to 30s (#109)

Primary motivation is to allow a standard Go profiling request to
complete, for which I only need the write timeout, but I raised the
other 5-second timeouts too since it isn't that unusual to have
operations or data uploads take longer.

* Return errors when http/grpc servers exit. (#92)

* Return errors when http/grpc servers exit.

There was a case where the HTTP server died but the process exited. We
need to exit when the servers exit.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Expose jaeger metrics (#114)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Improve logging of http errors (#115)

* Improve http request logging

When logging an http request, e.g. on 500 error, put all the headers
and response on one line.

Also strip cookies and csrf tokens on security grounds.

* Add a test to generate an http error to check formatting

* Update httpgrpc to match cortexproject/cortex#910 (#117)

* Refactor: extract functions HTTPRequest(), WriteResponse(), WriteError()

* Regenerate httpgrpc.pb.go with protoc-gen-gogo

from https://github.com/gogo/protobuf

Change taken from
cortexproject/cortex@f807690
where no explanation was given.

* Expose the HTTP server from the server struct. (#118)

* Expose the HTTP server from the server struct.

This allows users to inject HTTP requests and have the logged, instrumented and traced correctly.

* Fix #120: buffer the errChan, as one of the servers could return an error before Run starts listening on errChan.

Signed-off-by: Tom Wilkie <[email protected]>

* Add HTTP tracing middleware (#119)

* Print traceID with request (#123)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Also set global logging when initializing logrus (#122)

* Refactor

- s/logusEntry/logrusEntry

* Also set global logging when initializing logrus

This PR sets the global logging instance when calling `logging.Setup(level)`.
Previously, only the standard logrus logger was set up and if anyone used
the new funcs such as `logging.Info()` it was discarded.

* Don't trace http requests when using httpgrpc. (#124)

* Don't trace http requests when using httpgrpc.

* Demonstrate how the same can be acheived with middleware.Tracer.

* Print logrus output in logfmt format (#121)

* Print logrus output in logfmt format

This changes the way logrous outputs log messages in case there is no
TTY attached.

Changes from

        ERRO: 2018/08/23 20:28:10.075952 Sample log message extra=foo

to

        time="2018-08-23T20:28:10Z" level=info msg="Sample log message" extra="foo"

which follows the `logfmt` key-value format by also providing `time`,
`level`, and `msg` in adition to the extra fields.

The PR removes the custom `textFormatter` to have the default
`logrus.TextFormatter` in place which takes care of detecting whether a
terminal is attached or not and switches between text and logfmt output
automatically.
It appears the custom formatter was to prevent differences in rendering
depending on color/non-color output which I don't necessarily see as an
issue.

* dep ensure to fix tests

* Update more golang/protobuf to gogo/protobuf (#127)

* Update 'dep' to 0.5.0

* Update more golang/protobuf to gogo/protobuf

* add constraints for downstream users

* Label http request handler tracing spans with the matched route name (#126)

* Pass options to tracing middleware

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Use the route matcher instead

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix dropped namespace in url parsing (#128)

* Update grpc-opentracing to latest version at new home (#129)

* go-grpc-middleware has moved (#130)

* Update to latest kuberesolver (#131)

* Update to latest kuberesolver and use a load balancing policy

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Log which ports the server is listening on. (#143)

Signed-off-by: Tom Wilkie <[email protected]>

* server: add PathPrefix to config for all routes

This can be used to serve an API from a prefix (e.g. /prometheus/)
without having to re-write every HTTP Handler.

* Allow server.Config to be unmarshaled from YAML.

Signed-off-by: Tom Wilkie <[email protected]>

* Dont flag context.Canceled as an error in tracing

Cancellations are always caused by something else, so having them show
up as errors in the tracing UI is distracting.

* Allow overriding max message sizes and concurrent streams in the server.

NB This use the gRPC defaults as the flag defaults.

Signed-off-by: Tom Wilkie <[email protected]>

* exclude Authorization headers from log messages.

* server: use -server.path-prefix for metrics and debug endpoints

Previously calling .Subrouter() would overwrite the instrumentation
handlers and they'd be ignored when PathPrefix was specified. This
meant one could never get back metrics or debug endpoints.

* Skip logging context cancelled warning msgs

Context cancellations are controlled by clients and logging this message can
cause more more confusion as mentioned here:
cortexproject/cortex#1279

Signed-off-by: Neeraj Poddar <[email protected]>

* Allow log levels to be marshalled to YAML.  Test it works.

Signed-off-by: Tom Wilkie <[email protected]>

* Fix lint warning

* Expose RegisterInstrumentation setting

Fix #156

Signed-off-by: Xiang Dai <[email protected]>

* Update prometheus/client_golang and prometheus/procfs

Signed-off-by: Ganesh Vernekar <[email protected]>

* Allow user to specify HTTP and gRPC bind addresses (not just ports.)

This allows us to use 'localhost' in tests, and prevents an 'Allow connections from...' dialog on MacOS when running unit tests.

Signed-off-by: Tom Wilkie <[email protected]>

* Change naming from host to address

Signed-off-by: Thore Kruess <[email protected]>

* Revert changes to server_test.go

* Extend tests to check server works with flag defaults

(with exception of http port, which we override for the convenience of
people running the tests as an ordinary user who can't bind to port 80)

* limit the number of connections to server

Signed-off-by: Jacob Lisi <[email protected]>

* Tell Go runtime we have stopped listening to signals

If we exit the handler loop, e.g. after receiving SIGTERM, then we
don't want Go to attempt further delivery of signals.

* Fix kuberesolver URLs to have three slashes

* Add more test cases prompted by review feedback

* Trace Websocket requests

Update the OpenTracing library to a version which supports Websockets,
and remove the workaround for when it didn't.

* Update gogo proto to 1.3.0

Signed-off-by: Cyril Tovena <[email protected]>

* add flags to configure keep alive configs (#174)

* add flags & yaml to configure keep alive configs

Signed-off-by: Jacob Lisi <[email protected]>

* Fixed gRPC server keepalive flags registration

Signed-off-by: Marco Pracucci <[email protected]>

* Refactor: extract common error decoding

* Re-do protogen for fake_server with current gRPC

* Extend server tests with canceled gRPC functions

* Extend cancelation checking to include gRPC errors

* Extend error instrumentation tests to http

* Instrument canceled gRPC calls with a unique label

* Return error in configuring jaeger

* add log format option

* Add tls support to http server

Signed-off-by: Annanay <[email protected]>

* Use require package in tests, correct TLS fields in server config

Signed-off-by: Annanay <[email protected]>

* Load certs only if config options are specified

Signed-off-by: Annanay <[email protected]>

* Add metrics namespace to server

Signed-off-by: Annanay <[email protected]>

* use same system as level for format

* Add tls support to grpc server, add dummy certs

Signed-off-by: Annanay <[email protected]>

* Addressed @tomwilkie's review comments

Signed-off-by: Annanay <[email protected]>

* Fix broken tests

Signed-off-by: Annanay <[email protected]>

* Add TODO note about using copied function

Signed-off-by: Annanay <[email protected]>

* Use ConfigToTLSConfig util function from node_exporter

Signed-off-by: Annanay <[email protected]>

* Log errors when write fails, not 200

See discussion here: cortexproject/cortex#2483 (comment)

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Address @bboreham's comments

Signed-off-by: Annanay <[email protected]>

* fix function documentation

* Move shebang to first line, add commit hash to link

Signed-off-by: Annanay <[email protected]>

* Add test and warn only if not context cancelled

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Added option to disable signal handling by Server. (#191)

* Make signal-handling by Server configurable.

* Fixed unrelated lint warnings.

* Log X-Forwarded-For for every request

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments and add Forwarded and X-Real-IP support

Signed-off-by: Michel Hollands <[email protected]>

* Remove making standard middleware optional

Signed-off-by: Michel Hollands <[email protected]>

* Remove unused fields as well

Signed-off-by: Michel Hollands <[email protected]>

* Add option to turn logging of source IPs on

Signed-off-by: Michel Hollands <[email protected]>

* Add sourceIPs tag in tracing

Signed-off-by: Michel Hollands <[email protected]>

* Add custom regex for extracting IP from header

Signed-off-by: Michel Hollands <[email protected]>

* Rename file so it's clearer what it does

Signed-off-by: Michel Hollands <[email protected]>

* Use better names

Signed-off-by: Michel Hollands <[email protected]>

* Do no use log prefix in config var name

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments

Signed-off-by: Michel Hollands <[email protected]>

* Use same way of access struct

Signed-off-by: Michel Hollands <[email protected]>

* Address review comments

Signed-off-by: Michel Hollands <[email protected]>

* Replace interceptor with httpsnoop library.

This library supports additional interfaces other than http.Hijacker.

Signed-off-by: Peter Štibraný <[email protected]>

* Metrics for HTTP and gRPC message sizes and inflight requests.

Signed-off-by: Peter Štibraný <[email protected]>

* Added metrics test.

Signed-off-by: Peter Štibraný <[email protected]>

* s/Kuberneretes/Kubernetes/

Signed-off-by: Jack Baldry <[email protected]>

* Make HTTP middleware optional (#194)

* Make HTTP middleware optional
* Allow the Router to be set by caller

Signed-off-by: Michel Hollands <[email protected]>

* Add exemplar support to middleware

Signed-off-by: beorn7 <[email protected]>

* tracing: Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port

closes #203

* Expose grpc keepalive enforcement policy options.

Signed-off-by: Peter Štibraný <[email protected]>

* s/faviourite/preferred

Changed the wording to avoid US/UK English concerns.

* Set the content-length of httpgrpc request.

http.NewRequest can't set it while using a `io.NopCloser`.

Signed-off-by: Cyril Tovena <[email protected]>

* Replace `ioutil.NopCloser` with a custom one.

This way we can easily get direct access to the underlaying `bytes.Buffer`. This means we don't need to copy the data anymore.

Signed-off-by: Cyril Tovena <[email protected]>

* Fix panic in server i13n matching NotFoundHandler

When router can't find the route but it has the NotFoundHandler set, the
RouteMatch doesn't have the Route set but it has MatchErr set to
ErrNotFound.

Added a check for RouteMatch.Route to be not nil before accessing it,
and added a special "notfound" route name to be used when
NotFoundHandler is used.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Unindent getRouteName by using early returns

Signed-off-by: Oleg Zaytsev <[email protected]>

* support custom registerer

* Add metric "tcp_connections" for number of TCP connections to server.

Counts the number of accepted connections and the number of closed
connections, by intercepting the listener for each port (http/grpc).

Signed-off-by: Steve Simpson <[email protected]>

* add tcpv4 support

* tracing: add options to NewFromEnv

Signed-off-by: Dave Henderson <[email protected]>

* Refactor: move ExtractTraceID functions to tracing package

Signed-off-by: Bryan Boreham <[email protected]>

* Add exemplars to 'instrument' package

Whenever a duration is observed and we have a trace ID, store it
as a Prometheus exemplar.

Signed-off-by: Bryan Boreham <[email protected]>

* refactor: re-use ObserveWithExemplar function

Signed-off-by: Bryan Boreham <[email protected]>

* Add exemplars to gRPC instrumentation

We need to call the tracing interceptor before the metric interceptor
so that the latter can extract the trace ID.

Signed-off-by: Bryan Boreham <[email protected]>

* Allow blank network in server.New

There are a lot of tests which create a `server.Config{}` and don't
populate the network fields; allow this by setting `DefaultNetwork` if
blank.

Also pick up a bugfix where `HTTPListenNetwork` was used for gRPC.

Signed-off-by: Bryan Boreham <[email protected]>

* gRPC stream middleware: move metrics after tracing

So that we can pass the trace ID as an exemplar.

Signed-off-by: Bryan Boreham <[email protected]>

* Migrating from go-kit/kit/log to the slimmer go-kit/log

Signed-off-by: Danny Kopping <[email protected]>

* Add benchmark for Debugf()

Extract function to wrap with level, caller, timestamp.

Signed-off-by: Bryan Boreham <[email protected]>

* Filter by log level before anything else

This saves a lot of work walking the stack to find the caller, and a bit
more to format the timestamp.

Recommended at go-kit/log#14 (comment)

Signed-off-by: Bryan Boreham <[email protected]>

* Avoid Sprintf when log line is filtered out

Defer the Sprintf until after the level check, by using an auxilliary struct

Signed-off-by: Bryan Boreham <[email protected]>

* Revert "logging: Add ability to deduplicate log messages (#48)"

This reverts commit 0aecff0.

It has never de-duplicated a single line for us in production.

* fix master build

- run `go mod tidy`
- change logging test to use `go-kit/log`

* replace node_exporter https package with exporter-toolkit

The node_exporter/https package has been moved to exporter-toolkit as of
v1.1.0. weaveworks/common was still using this package, which was
preventing downstream importers (i.e., grafana/agent) from updating
their dependency on node_exporter.

This change also required bumping kuberesolver to v2.4.0 (the next
version after v2.1.0) which uses a newer version of the gRPC library
where type names have changed.

kuberesolver diff: sercand/kuberesolver@v2.1.0...v2.4.0

* Added tcp_connections_limit metric

Signed-off-by: Marco Pracucci <[email protected]>

* Add a tag with the user-agent to traces

This is useful to understand which kind of client is behaving in a
certain way.

Signed-off-by: Christian Simon <[email protected]>

* undo whitespace changes to server/server_test.go

* Merged both tag setters into a single MWSpanObserver

Signed-off-by: Christian Simon <[email protected]>

* middleware: Add gRPC instrumentation utilities from dskit

Signed-off-by: Arve Knudsen <[email protected]>

* middleware: Rename client instrumentation functions

Signed-off-by: Arve Knudsen <[email protected]>

* middleware: Move client instrumentation functions to grpc_instrumentation.go

Signed-off-by: Arve Knudsen <[email protected]>

* middleware: Slight cleanup

Signed-off-by: Arve Knudsen <[email protected]>

* add flag to log requests at info level

* fix comment for LogRequestsAtInfoLevel

Co-authored-by: Victor Cinaglia <[email protected]>

* add yaml tag & add LogRequestsAtInfoLevel to RegisterFlags

* add method to set log level for logging with request

* Add user and org labels to observed exemplars

Exemplars provide information about traces, and traces usually reference
to the user that served the request, however, searching the user in the
trace is usually a tedious task, and it would be nice to be able to
overwiew whether the slow requests (exemplars) from your panel
corresponds to the same user.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Revert "add method to set log level for logging with request"

This reverts commit 3422262.

* call logWithRequest once

* middleware: Simplify as suggested by @pracucci

Signed-off-by: Arve Knudsen <[email protected]>

* fix typo in registering LogRequestAtInfoLevel flag

* Removed debug log on successfuly gRPC requests from GRPCServerLog

Signed-off-by: Marco Pracucci <[email protected]>

* Make debug logging on successful gRPC requests from GRPCServerLog optional

* Update server/server.go

Co-authored-by: Marco Pracucci <[email protected]>

* Apply changes according to code review

* server: Remove advanced TLS config parameters

Remove advanced TLS config parameters stemming from
github.com/prometheus/exporter-toolkit/web, that were introduced in commit
56b8fa7. Motivation for their removal being
that users would most likely not want to change them, and they add corresponding
configuration parameters to the Grafana Mimir project, that we don't want. We
also think they're not interesting to the Grafana Tempo and Loki projects.

Signed-off-by: Arve Knudsen <[email protected]>

* Revert "Add user and org labels to observed exemplars"

* server: Expose `http` and `grpc` listen addresses.

The main rationale is for "choosing" random unbinded port for testing.

It's one of the Go's idiom to pass port number `0` to enforce Go's net's package to bind to some random port that is free.

```
func main() {
    httpListener, err := net.Listen("tcp", "0.0.0.0:0")
    if err != nil {
        panic(err)
    }

    grpcListener, err := net.Listen("tcp", "0.0.0.0:0")
    if err != nil {
        panic(err)
    }

    h := http.Server{}
    go func() {
        log.Println("http serving at", httpListener.Addr())
        h.Serve(httpListener)
    }()

    g := grpc.Server{}
    go func() {
        log.Println("grpc serving at", grpcListener.Addr())
        g.Serve(grpcListener)
    }()

    time.Sleep(20 * time.Second)
}
```

And that will bind random "unassigned" port.

```
2022/08/08 16:51:49 grpc serving at [::]:35359
2022/08/08 16:51:49 http serving at [::]:44671
```

The problem is, currently there is no way to know what those ports are.

This PR exposes those addresses so that, we can test server easily and pass on the listen addresses to all it's clients.
Signed-off-by: Kaviraj <[email protected]>

* Bugfix server tcp metrics to use the configured registry or the default Prometheus registry

Server TCP related metrics tcp_connections and tcp_connections_limit are currently always registered with the default prometheus registry: https://github.com/prometheus/client_golang/blob/main/prometheus/registry.go#L57
As a result, it can cause an attempt to register duplicate metrics collector failing with a panic.

This commit updates these metrics to use the configured Registry, if any, otherwise, the default Prometheus Registry.
log and gatherer initializations were also moved for consistency.

* Fix incorrect import of obsolete github.com/go-kit/kit/log package

Signed-off-by: Dave Henderson <[email protected]>

* httpgrpc/server: Update NewClient to not use WithBalancerName

On projects consuming this module as a library, gRPC might be set at
newer versions where the deprecated #WithBalancerName has been removed
already. Given that the version used by this module already offers a
forward-compatible method, this commit uses that instead of the
deprecated #WithBalancerName.

This is a simplified version of #240, without touching the gRPC versions.

Fixes #239

Signed-off-by: Juraci Paixão Kröhling <[email protected]>

* Fix 'make protos'

Use a 3rd-party image `namely/protoc:1.22_1`, which generates identical
output for two of the protos in this repo.

The third one, httpgrpc, I have not managed to match exactly, but the
result is very close.

Signed-off-by: Bryan Boreham <[email protected]>

* Implement `http.Flusher` interface on Log middleware (#257)

This adds `Flush()` to the `middleware.badResponseLoggingWriter`, making it implement `http.Flusher` if the wrapped `http.ResponseWriter` does.

* Allow config of TLS cipher suites and min version (#256)

* Server: allow config of TLS cipher suites and min version

Add a single parameter for each, not split across HTTP and gRPC.

Required change upstream - prometheus/exporter-toolkit#110.

Downstream projects rely on CLI parameters to generate docstrings,
so we add `--server.tls-cipher-suites` and `--server.tls-min-version`.
Both CLI and yaml require comma-separated lists of cipher suites,
which is different to the yaml array format supported by prometheus/exporter-toolkit.

The names accepted are from Go, listed here: https://pkg.go.dev/crypto/tls#pkg-constants

Signed-off-by: Bryan Boreham <[email protected]>

* Tweak TLS server option help text

Signed-off-by: Bryan Boreham <[email protected]>

* Reproducable httpgrpc.pb.go build

Use gogo protobuf options explicitly to generate the same Equal, String,
GoString etc functions in the protobug implementation.

Add tools.go to download the gogo.proto dependency.
Add vendor directory as include for protoc to load the dependency.
Update protoc version otherwise there is a diff in the generated files,
most notably
const _ = proto.GoGoProtoPackageIsVersion3
is replaced with
const _ = proto.GoGoProtoPackageIsVersion2

Signed-off-by: György Krajcsovits <[email protected]>

* Make gogo.proto import vendoring agnostic

prometheus/alertmanager imports gogo.proto as gogoproto/gogo.proto
This is because it does not use vendoring and needs to look up gogoproto
in the module cache, which results in a path like
/home/user/go/pkg/mod/github.com/gogo/[email protected]
To avoid having @v1.3.2 in the proto definition it then sets this
as an include path and import only gogoproto/gogo.proto.

Signed-off-by: György Krajcsovits <[email protected]>

* Updated prometheus/exporter-toolkit to v0.8.1

* Fixed tests

* Use ChainInterceptor from upstream grpc

And eliminate dependency on `grpc-ecosystem/go-grpc-middleware`.
Functionality is identical; the feature was added upstream in v1.21 for
client side and v1.28 for server side.

Signed-off-by: Bryan Boreham <[email protected]>

* Small testcode lint fixes (#269)

* Use built in string conversion of recoder.Body
* Defer close after error check

VS Code complains otherwise.

Signed-off-by: György Krajcsovits <[email protected]>

* server test: don't compare error for exact equality

DeepEqual is brittle against changes in implementation.

Signed-off-by: Bryan Boreham <[email protected]>

* server: remove unused fields from Client struct

Signed-off-by: Bryan Boreham <[email protected]>

* replace deprecated 'grpc.WithInsecure'

Signed-off-by: Bryan Boreham <[email protected]>

* middleware: fix lint complaint about buf.String

Signed-off-by: Bryan Boreham <[email protected]>

* miscelaneous lint fixes

Checking errors, etc.

Signed-off-by: Bryan Boreham <[email protected]>

* server: check error before closing connection

lint was complaining

Signed-off-by: Bryan Boreham <[email protected]>

* Log middleware now accepts a list of headers to exclude

Previously the log middleware would exclude printing only the headers
"Cookie", "Authorization" and "X-Csrf-Token". I believe the users of
this library may want to ommit more headers than those.

To do so I've changed how the middleware is initialize to optionally add
more headers to the list.

Also I've added a unit test to make sure the behavior works as expected.

Also I've added an optional configuration parameter to the server to set
LogRequestHeaders from there. This parameter was already an option in
the log middleware but was not exposed in the server.

Signed-off-by: Jesus Vazquez <[email protected]>

* Enable native histograms for the request_duration_seconds histogram

This is an opt-in via setting Config.MetricsNativeHistogramFactor so
that nobody will expose native histograms by accident. Even if native
histograms are enabled, the conventional histograms are still
maintained and presented as before to scrapers incapabale of scraping
native histograms.

This commit also includes an update of prometheus/client_golang to
v1.14.0 as that is the earliest release supporting native histograms.

Native histograms were tested previously in the sparse-histograms
branch without any problems. This commit obsoletes the
sparse-histograms branch.

The histograms request_message_bytes and response_message_bytes are
left as conventional histograms for now. Maybe they would work well as
native histograms with a bucket factor of 2, as high resolution is not
the aim here. But we can decide about that once we have gathered more
experience with the native histograms for request_duration_seconds.

The settings for limiting the bucket count
(i.e. NativeHistogramMaxBucketNumber = 100 and
NativeHistogramMinResetDuration = 1h) are both conservative and most
likely good enough for the expected use cases. We can make them
configurable later, if users feel the need. The idea here is to not
add too many knobs prematurely and thereby avoid confusion.

Signed-off-by: beorn7 <[email protected]>

* update grpc

* Add Unwrap method to http.ResponseWriter implementations. (#283)

Also update httpsnoop to version 1.0.3, which supports Unwrap method as well.

* Add description to command line flags (#287)

Add description to log-request command line flags

* Add support to route both GRPC and HTTP over the HTTP server (#288)

* add support for grpc on the http port

Signed-off-by: Joe Elliott <[email protected]>

* Make DisableRequestSuccessLog configurable. (#284)

- NewLogMiddleware is a public method. Adding a new parameter would make this PR a breaking change one.
- However, the behavior is the same: whatever is configured for the existing DisableRequestSuccessLog,
it will be used by the log middleware.
- Test option to not log successful requests.

* Allow users to manually register metrics

* Clean up some linter warnings

* grpc: use errors.Is to check if error is Canceled

This improves behaviour when one error wraps another.

Signed-off-by: Bryan Boreham <[email protected]>

* Make gRPC logging optional via a custom interface (#299)

* middleware: add OptionalLogging interface with method ShouldLog

E.g. if the error is caused by overload, then we don't want to log it
because that uses more resource.

* Add test for gRPC logging, patterned after the one for http logging.

Signed-off-by: Bryan Boreham <[email protected]>

* Add provenance and license comments to files migrated from weaveworks/common, update package names

* Update imports to match new package paths

* Fix linting issues.

* Fix package paths in protobuf descriptors and regenerate with gogo/protobuf

* Fix linting issues in migrated code.

* Remove dependency on deprecated package

* Replace use of `golang.org/x/net/context` with `context`

* Replace use of `prometheus` package with `promauto` where possible

* Add changelog entry.

---------

Signed-off-by: Tom Wilkie <[email protected]>
Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Neeraj Poddar <[email protected]>
Signed-off-by: Xiang Dai <[email protected]>
Signed-off-by: Ganesh Vernekar <[email protected]>
Signed-off-by: Thore Kruess <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Annanay <[email protected]>
Signed-off-by: Michel Hollands <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Jack Baldry <[email protected]>
Signed-off-by: beorn7 <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Steve Simpson <[email protected]>
Signed-off-by: Dave Henderson <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Christian Simon <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: Jesus Vazquez <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Co-authored-by: Julius Volz <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Marcus Cobden <[email protected]>
Co-authored-by: Bryan Boreham <[email protected]>
Co-authored-by: Goutham Veeramachaneni <[email protected]>
Co-authored-by: Roland Schilter <[email protected]>
Co-authored-by: Adam Shannon <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Anthony Woods <[email protected]>
Co-authored-by: Neeraj Poddar <[email protected]>
Co-authored-by: Bryan Boreham <[email protected]>
Co-authored-by: Xiang Dai <[email protected]>
Co-authored-by: Ganesh Vernekar <[email protected]>
Co-authored-by: Thore <[email protected]>
Co-authored-by: Jacob Lisi <[email protected]>
Co-authored-by: Cyril Tovena <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Aditya C S <[email protected]>
Co-authored-by: Sean Liao <[email protected]>
Co-authored-by: Annanay <[email protected]>
Co-authored-by: Goutham Veeramachaneni <[email protected]>
Co-authored-by: Roli Schilter <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Michel Hollands <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Jack Baldry <[email protected]>
Co-authored-by: Daniel Holbach <[email protected]>
Co-authored-by: Michel Hollands <[email protected]>
Co-authored-by: beorn7 <[email protected]>
Co-authored-by: Chance Zibolski <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>
Co-authored-by: Robert Fratto <[email protected]>
Co-authored-by: Steve Simpson <[email protected]>
Co-authored-by: 3Xpl0it3r <[email protected]>
Co-authored-by: Dave Henderson <[email protected]>
Co-authored-by: Danny Kopping <[email protected]>
Co-authored-by: Christian Simon <[email protected]>
Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: colin-stuart <[email protected]>
Co-authored-by: Victor Cinaglia <[email protected]>
Co-authored-by: Jeanette Tan <[email protected]>
Co-authored-by: zenador <[email protected]>
Co-authored-by: Kaviraj <[email protected]>
Co-authored-by: Susana Ferreira <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Stefan Prodan <[email protected]>
Co-authored-by: Justin Lei <[email protected]>
Co-authored-by: György Krajcsovits <[email protected]>
Co-authored-by: Sebastian Rabenhorst <[email protected]>
Co-authored-by: George Krajcsovits <[email protected]>
Co-authored-by: Jesus Vazquez <[email protected]>
Co-authored-by: Alan Protasio <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: Dylan Guedes <[email protected]>
Co-authored-by: Piotr Gwizdala <[email protected]>
charleskorn added a commit that referenced this issue Aug 11, 2023
* Use ps.Map for Counters and Sets, remove Metadata in favour of Latest.

Also
- Add more complicated report.json for benchmark
- Break up report/topology.go
- Implement our own DeepEqual for ps.Map

* Memoise & cache the result of renderers, so we don't recalculate views multiple times.

* Fix tests

* refactoring deepequal to satisfy linter

* add test for render/detailed/parents and fixed bug

* Decouple Scope lifecycle from Weave lifecycle

- Run the Weave integrations regardless of if weave is detected.
- Make everything backoff and not spam the logs.
- Add miekg dns to vendor.
- Have the app periodically register with weaveDNS, and the probe do lookups there.
- Decide what the local networks are at runtime, not once at startup.
- Correctly resolve app ids, fixes #825

* adding direction to connections from conntrack

* Remove report.EdgeMetadata.MaxConnCountTCP, as we don't display it anywhere
* Remove hostname metadata from local end of connection. We should be using the hostnodeid

* Remove address topology

* Fix spelling mistakes in the codebase.

* Refactoring rendering to remove RenderableNode

Squash of:
- use detailed.Summaries to render topology nodes
- ban merging nodes of different topologies (they should be mapped)
- need to prune parents when mapping node types
- render container images by id if they have no name
- remove separate render ids and prune parents in NewDerived*
- don't render metrics/metadata for groups of nodes
- fixing up tests
- removing pending unit tests (for mapping.go, for now)
- updating experimental dir for RenderableNode removal

* Review Feedback

Squash of:
- including children in topologies_test.go
- report.Node.Prune should prune children also
- rewrote ShortLivedInternetConnections test to express its intent
- adding tests for detail Summary rendering

* Update docker client, to get better state strings in the UI

* Remove load5 and load15

* not really useful
* take up lots of real estate

Fixes #1267

* Adding support for plugins, with basic example of iowait, and ebpf

Squash of:
* Include plugins in the report
* show plugin list in the UI
* moving metric and metadata templates into the probe reports
* update js for prime -> priority
* added retry to plugin handshake
* added iowait plugin
* review feedback
* plugin documentation

* adding a test for pod node rendering

* Add local_networks to weave Overlay nodes, so we can track weave without an exposed weave ip (#1313)

* Removing report.Node.WithID (#1315)

* removing usage of report.Node.WithID

* report.Topology.AddNode can use the node's ID field

* move counting sublabel definition to the topologies

* Reorganise the render/ package

* Fixing grouped node count for filtered children nodes

Squash of:

* We have to keep all the container hostnames until the end so we can
  count how many we've filtered

* Adding tests for ContainerHostnameRenderer and PodServiceRenderer with
  filters

* Because we filter on image name we need the image name before
  filtering

* Alternative approach to passing decorators.

* Refactor out some of the decorator capture

* Don't memoise decorated calls to Render

* Fixing filtered counts on containers topology

  Tricky, because we need the filters to be silent sometimes (when they're
  in the middle), but not when they're at the top, so we take the "top"
  filter's stats. However, this means we have to compose all
  user-specified filters into a single Filter layer, so we can get all
  stats.

  There are no more Silent filters, as all filters are silent (unless they
  are at the top).

  Additionally, I clarified some of the filters as their usage/terminology
  was inconsistent and confused. Now Filter(IsFoo, ...) *keeps* only nodes
  where IsFoo is true.

* Index Pods by UID and join with containers based on this.

* fixing up tests

* Index services by UID, and refactor out common k8s metadata handling

* Deployment and ReplicaSet views for k8s

* Review Feedback

* Use image name without version as id. (#1531)

* Use image name without version as id.

* Review feedback

* Fix lint in all the build-tools scripts

* Squashed 'tools/' changes from 7a66090..8c6170d

8c6170d Fix lint in all the build-tools scripts
239935c shell-lint tool
d9ab133 Script for finding files with a given type
1b64e46 Add Weave Cloud client
f2e40b4 Time out commands after three minutes
2da55ce Don't spell-check compressed files
e9749a5 Make scheduler aware of test parallelisation

git-subtree-dir: tools
git-subtree-split: 8c6170d292d34923d23d5eef6b6ac5e4af6b7188

* Speed up test by using git ls-files

* Remove spurious debugging code from test

* Test directories need ./ prefixes, obviously.

* Squashed 'tools/' changes from 0620e58..e9e7e6b

e9e7e6b Merge pull request #26 from weaveworks/this-time-for-sure
df494d6 Remove dependencies
c045d16 Properly exclude vendor from lint
2cfcf08 Add blacklist to wcloud client
ca6ebfb Merge pull request #25 from weaveworks/fix-brokenness
bfb1747 Test directories need ./ prefixes, obviously.
5b9b314 Merge pull request #24 from weaveworks/find-files
8786427 Remove spurious debugging code from test
8b7ec6e Speed up test by using git ls-files
cf53dc1 Exclude vendor from shell linting
b2ab380 Fix field name
c86fd3d Add notification config for wcloud
f643920 Merge pull request #23 from weaveworks/only-lint-git-files
47a0152 Only lint git files
50d47f9 Merge pull request #22 from weaveworks/shell-lint

git-subtree-dir: tools
git-subtree-split: e9e7e6b0f042eb38bd2d45591f3f5c8364bdc7ce

* Remove Metric Add() method

* Helps reduce garbage (MakeMetric() now takes a slice and there's a shorter version MakeSingletonMetric())
* Fixes bug computing Max (Min) in samples since using MakeMetric()
  was causing a default Max/Min of zero.
* Simplifies code a bit

* Remove Metric WithFirst() method

It was only used in tests and wasn't really necessary

* Append namespace to endpoint scope for loopback connections

* Remove redundant kubernetes ID

* Make dumper a bit more verbose

So it displays differences behind interface that would otherwise go
unnoticed (like string vs []byte).

* Sync changes done directly in scope/tools

Applies tools changes from:

commit 7f46b90e272f00275ce47a29e0cef0c62bbc4e38
Author: Krzesimir Nowak <[email protected]>
Date:   Thu Jul 21 11:55:08 2016 +0200

    Make LatestMap "generic"

    This commit makes the LatestMap type a sort of a base class, that
    should not be used directly. This also adds a generator for LatestMap
    "concrete" types with a specific data type in the LatestEntry.

commit 0f1cb82084435622b2b1c78bd36884cf90f1ab25
Author: Krzesimir Nowak <[email protected]>
Date:   Thu Jul 28 14:26:08 2016 +0200

    Allow testing only a subset of directories

    This can be done by calling TESTDIRS="./report ./probe" make tests

commit 97eb8d033dc07dee338a43c170e809731ca3c9e4
Author: Krzesimir Nowak <[email protected]>
Date:   Thu Aug 4 11:44:21 2016 +0200

    Do not spell check Makefiles and JSON files

    Spell check does not handle JSON files very well. It finds misspelled
    words in hexadecimal hashes (like misspelled "dead" in "123daed456").

    Ignore Makefiles too - there is not so much free text to spell check
    and the spell check complains about words it was told by Makefile to

* Squashed 'tools/' changes from e9e7e6b..b990f48

b990f48 Merge pull request #42 from kinvolk/lorenzo/fix-git-diff
224a145 Check if SHA1 exists before calling `git diff`
1c3000d Add auto_apply config for wcloud
0ebf5c0 Fix wcloud -serivice
4fe078a Merge pull request #39 from weaveworks/fix-wrong-subtree-use
3f4934d Remove generate_latest_map
48beb60 Sync changes done directly in scope/tools
45dcdd5 Merge pull request #37 from weaveworks/fix-mflag-missing
b895344 Use mflag package from weaveworks fork until we find a better solution
e030008 Merge pull request #36 from weaveworks/wcloud-service-flags
9cbab40 Add wcloud Makefile
ef55901 Review feedback, and build wcloud in circle.
3fe92f5 Add wcloud deploy --service flag
3527b56 Merge pull request #34 from weaveworks/repo-branch
92cd0b8 [wcloud] Add support for repo_branch option
9f760ab Allow wcloud users to override username
38037f8 Merge pull request #33 from weaveworks/wcloud-templates
7acfbd7 Propagate the local username
e6876d1 Add template fields to wcloud config.
f1bb537 Merge pull request #30 from weaveworks/mike/shell-lint/dont-error-if-empty
e60f5df Merge pull request #31 from weaveworks/mike/fix-shell-lint-errors
e8e2b69 integrations: Fix a shellcheck linter error
a781575 shell-lint: Don't fail if no shell scripts found
db5efc0 Merge pull request #28 from weaveworks/mike/add-image-tag
5312c40 Import image-tag script into build tools so it can be shared
7e850f8 Fix logs path
dda9785 Update deploy api
f2f4e5b Fix the wcloud client
3925eb6 Merge pull request #27 from weaveworks/wcloud-events
77355b9 Lint
d9a1c6c Add wcloud events, update flags and error nicely when there is no config

git-subtree-dir: tools
git-subtree-split: b990f488bdc7909b62d9245bc4b4caf58f5ae7ea

* Add Weave peers view

* Extend metadata in details panel for Weave Net nodes

* Added container filters as CLI arguments

gofmt load_container_filters.go

removed the environment variable for container label filters

Added the --app.container-label-filter command line argument, and load_container_filters.go now uses the results from that

Changed init() to InitializeTopologies()

Changed init() to InitializeTopologies() so that it can be called after the container filters are loaded from the command line argument. init() executes before main() in prog/main.go, so the flag parsing isn't finished before init() is called

Applied lint fixes

fixed lint issues

brought back the init function for api_topologies.go

Addressed many of the PR comments, except escaping colons

Renamed IsDesired to HasLabel in render/filters.go

Allows for the user to escape colons

added registry function for modifying the container filters

created a separate function that parses the container filter flags

simplified registry.addContainerFilters()

addressed review comments

switched API Topology Description IDs to constants

addressed review comments

joined constants

added test functions

addressed most of the review comments

Changed containerLabelFilters to an array of APItopologyOptions, placing the parsing in the Set() function. Removed parsing from HasLabel in render/filters.go

refactored code

added test that applies to the container filtering by labels

applied golint

made Registry items private and added a MakeRegistry() function

fixed usage of topologyRegistry.RendererForTopology

Added container label filters by exclusion

minor update to report_fixture

Modified container labels test to use existing report

I added labels to the existing containers in the fixed report for testing.

refactored code

refactored code

further code refactoring

addressed @ijsnellf's review comments

unexported Registry, and reduced duplicate code

addressed @ijsnellf's review comments

Addressed review comments

Addressed final review comments

* linter: fix punctuation and capitalization

Symptoms:
> app/control_router.go:44:38: error strings should not be capitalized or end with punctuation or a newline

This is blocking the build on:
https://circleci.com/gh/kinvolk/scope/363

* Squashed 'tools/' content from commit fd875e2

git-subtree-dir: tools
git-subtree-split: fd875e27c5379d443574bcf20f24a52a594871ca

* Remove uncommon things

* Move packages out of common directory

* Don't use scope's reflect

* poll is secretly scope specific

* Squashed 'tools/' changes from fd875e2..03cc598

03cc598 Don't lint generated protobuf code.
2b55c2d Merge pull request #66 from weaveworks/reduce-test-timeout
d4e163c Make timeout a flag
49a8609 Reduce test timeout
8fa15cb Merge pull request #63 from weaveworks/test-defaults
b783528 Tweak test script so it can be run on a mca
a3b18bf Merge pull request #65 from weaveworks/fix-integration-tests
ecb5602 Fix integration tests
f9dcbf6 ... without tab (clearly not my day)
a6215c3 Add break I forgot
0e6832d Remove incorrectly added tab
eb26c68 Merge pull request #64 from weaveworks/remove-test-package-linting
f088e83 Review feedback
2c6e83e Remove test package linting
2b3a1bb Merge pull request #62 from weaveworks/revert-61-test-defaults
8c3883a Revert "Make no-go-get the default, and don't assume -tags netgo"
e75c226 Fix bug in GC of firewall rules.
e49754e Merge pull request #51 from weaveworks/gc-firewall-rules
191f487 Add flag to enale/disable firewall rules' GC.
567905c Add GC of firewall rules for weave-net-tests to scheduler.
03119e1 Fix typo in GC of firewall rules.
bbe3844 Fix regular expression for firewall rules.
c5c23ce Pre-change refactoring: splitted gc_project function into smaller methods for better readability.
ed5529f GC firewall rules
ed8e757 Merge pull request #61 from weaveworks/test-defaults
57856e6 Merge pull request #56 from weaveworks/remove-wcloud
dd5f3e6 Add -p flag to test, run test in parallel
62f6f94 Make no-go-get the default, and don't assume -tags netgo
8946588 Merge pull request #60 from weaveworks/2647-gc-weave-net-tests
4085df9 Scheduler now also garbage-collects VMs from weave-net-tests.
4b7d5c6 Merge pull request #59 from weaveworks/57-fix-lint-properly
b7f0e69 Merge pull request #58 from weaveworks/fix-lint
794702c Pin version of shfmt
ab1b11d Fix lint
d1a5e46 Remove wcloud cli tool
81d80f3 Merge pull request #55 from weaveworks/lint-tf
05ad5f2 Review feedback
4c0d046 Use hclfmt to lint terraform.

git-subtree-dir: tools
git-subtree-split: 03cc5989769d93aa03f8aed3784ddfdb1fffc1c6

* Squashed 'tools/' changes from 03cc598..9857568

9857568 Use mflag and mflagext package from weaveworks/common.
9799112 Quote bash variable.
10a36b3 Merge pull request #67 from weaveworks/shfmt-ignore
a59884f Add support for .lintignore.

git-subtree-dir: tools
git-subtree-split: 98575686df9c946c076bbfd7bc50c62825f1f030

* Add SetEnv() to Cmd mocking interface

* test: Do not block when reading stderr pipe

* Squashed 'tools/' changes from 9857568..41c5622

41c5622 Merge pull request #90 from weaveworks/build-golang-service-conf
e8ebdd5 broaden imagetag regex to fix haskell build image
ba3fbfa Merge pull request #89 from weaveworks/build-golang-service-conf
e506f1b Fix up test script for updated shfmt
9216db8 Add stuff for service-conf build to build-goland image
66a9a93 Merge pull request #88 from weaveworks/haskell-image
cb3e3a2 shfmt
74a5239 Haskell build image
4ccd42b Trying circle quay login
b2c295f Merge branch 'common-build'
0ac746f Trim quay prefix in circle script
c405b31 Merge pull request #87 from weaveworks/common-build
9672d7c Push build images to quay as they have sane robot accounts
a2bf112 Review feedback
fef9b7d Add protobuf tools
10a77ea Update readme
254f266 Don't need the image name in
ffb59fc Adding a weaveworks/build-golang image with tags
b817368 Update min Weave Net docker version
cf87ca3 Merge pull request #86 from weaveworks/lock-kubeadm-version
3ae6919 Add example of custom SSH private key to tf_ssh's usage.
cf8bd8a Add example of custom SSH private key to tf_ansi's usage.
c7d3370 Lock kubeadm's Kubernetes version.
faaaa6f Merge pull request #84 from weaveworks/centos-rhel
ef552e7 Select weave-kube YAML URL based on K8S version.
b4c1198 Upgrade default kubernetes_version to 1.6.1.
b82805e Use a fixed version of kubeadm.
f33888b Factorise and make kubeconfig option optional.
f7b8b89 Install EPEL repo for CentOS.
615917a Fix error in decrypting AWS access key and secret.
86f97b4 Add CentOS 7 AMI and username for AWS via Terraform.
eafd810 Add tf_ansi example with Ansible variables.
2b05787 Skip setup of Docker over TCP for CentOS/RHEL.
84c420b Add docker-ce role for CentOS/RHEL.
00a820c Add setup_weave-net_debug.yml playbook for user issues' debugging.
3eae480 Upgrade default kubernetes_version to 1.5.4.
753921c Allow injection of Docker installation role.
e1ff90d Fix kubectl taint command for 1.5.
b989e97 Fix typo in kubectl taint for single node K8S cluster.
541f58d Remove 'install_recommends: no' for ethtool.
c3f9711 Make Ansible role docker-from-get.docker.com work on RHEL/CentOS.
038c0ae Add frequently used OS images, for convenience.
d30649f Add --insecure-registry to docker.conf
1dd9218 shfmt -i 4 -w push-images
6de96ac Add option to not push docker hub images
310f53d Add push-images script from cortex
8641381 Add port 6443 to kubeadm join commands for K8S 1.6+.
50bf0bc Force type of K8S token to string.
08ab1c0 Remove trailing whitespaces.
ae9efb8 Enable testing against K8S release candidates.
9e32194 Secure GCP servers for Scope: open port 80.
a22536a Secure GCP servers for Scope.
89c3a29 Merge pull request #78 from weaveworks/lint-merge-rebase-issue-in-docs
73ad56d Add linter function to avoid bad merge/rebase artefact
52d695c Merge pull request #77 from kinvolk/schu/fix-relative-weave-path
77aed01 Merge pull request #73 from weaveworks/mike/sched/fix-unicode-issue
7c080f4 integration/sanity_check: disable SC1090
d6d360a integration/gce.sh: update gcloud command
e8def2c provisioning/setup: fix shellcheck SC2140
cc02224 integration/config: fix weave path
9c0d6a5 Fix config_management/README.md
334708c Merge pull request #75 from kinvolk/alban/external-build-1
da2505d gce.sh: template: print creation date
e676854 integration tests: fix user account
8530836 host nameing: add repo name
b556c0a gce.sh: fix deletion of gce instances
2ecd1c2 integration: fix GCE --zones/--zone parameter
3e863df sched: Fix unicode encoding issues
51785b5 Use rm -f and set current dir using BASH_SOURCE.
f5c6d68 Merge pull request #71 from kinvolk/schu/fix-linter-warnings
0269628 Document requirement for `lint_sh`
9a3f09e Fix linter warnings
efcf9d2 Merge pull request #53 from weaveworks/2647-testing-mvp
d31ea57 Weave Kube playbook now works with multiple nodes.
27868dd Add GCP firewall rule for FastDP crypto.
edc8bb3 Differentiated name of dev and test playbooks, to avoid confusion.
efa3df7 Moved utility Ansible Yaml to library directory.
fcd2769 Add shorthands to run Ansible playbooks against Terraform-provisioned virtual machines.
f7946fb Add shorthands to SSH into Terraform-provisioned virtual machines.
aad5c6f Mention Terraform and Ansible in README.md.
dddabf0 Add Terraform output required for templates' creation.
dcc7d02 Add Ansible configuration playbooks for development environments.
f86481c Add Ansible configuration playbooks for Docker, K8S and Weave-Net.
efedd25 Git-ignore Ansible retry files.
765c4ca Add helper functions to setup Terraform programmatically.
801dd1d Add Terraform cloud provisioning scripts.
b8017e1 Install hclfmt on CircleCI.
4815e19 Git-ignore Terraform state files.
0aaebc7 Add script to generate cartesian product of dependencies of cross-version testing.
007d90a Add script to list OS images from GCP, AWS and DO.
ca65cc0 Add script to list relevant versions of Go, Docker and Kubernetes.
aa66f44 Scripts now source dependencies using absolute path (previously breaking make depending on current directory).
7865e86 Add -p option to parallelise lint.
36c1835 Merge pull request #69 from weaveworks/mflag

git-subtree-dir: tools
git-subtree-split: 41c562219dcc03a70dfdd9b1353cd4cd4f1cab46

* fs: add ReadDirCount (#30)

* fs: add ReadDirCount

ReadDirNames is expensive and better avoided when we don't care about
the names.

* fs: add tests and benchmarks for ReadDirNames and ReadDirCount

$ go test -bench Benchmark*
BenchmarkReadDirNames-4   	   10000	    155566 ns/op
BenchmarkReadDirCount-4   	   10000	    124618 ns/op

* Fix lint error

* Add ConfigFromURL helper function

extracted from weaveworks/cortex

* Allow the AWS backend to get credentials from its environment. (#111)

Signed-off-by: Tom Wilkie <[email protected]>

* Default diff-printer to hide details (#103)

This new setting will make it print the difference between String()
output only.  This is more usable for developers in 99% of cases where
the difference can be seen.

* Squashed 'tools/' changes from 41c5622..d6cc704

d6cc704 Fix comment
7139116 Revert "Push comments to the left so they don't appear in scripts"
e47e58f Push comments to the left so they don't appear in scripts
3945fce Remove nonexistent env var GIT_TAG
cd62992 Merge pull request #156 from weaveworks/drop-quay
af0eb51 Merge pull request #157 from weaveworks/fix-image-tag-prefix-length
0b9aee4 Fix image-tag object name prefix length to 8 chars.
813c28f Move from CircleCI 1.0 to 2.0
425cf4e Move from quay.io to Dockerhub
87ccf4f Merge pull request #155 from weaveworks/go-1-12
c31bc28 Update lint script to work with Go 1.12
ed8e380 Update to Go 1.12.1
ec369f5 Merge pull request #153 from dholbach/drop-email
ef7418d weave-users mailing list is closed: https://groups.google.com/a/weave.works/forum/#!topic/weave-users/0QXWGOPdBfY
6954a57 Merge pull request #144 from weaveworks/golang-1.11.1
9649eed Upgrade build image from golang:1.10.0-strech to 1.11.1-strech
59263a7 Merge pull request #141 from weaveworks/update-context
e235c9b Merge pull request #143 from weaveworks/gc-wks-test-vms
c865b4c scheduler: please lint/flake8
da61568 scheduler: please lint/yapf
ce9d78e scheduler: do not cache discovery doc
e4b7873 scheduler: add comment about GCP projects' IAM roles needed to list/delete instances and firewall rules
ff7ec8e scheduler: add comment about CircleCI projects' access via the API
2477d98 scheduler: deploy command now sets the current datetime as the version
5fcd880 scheduler: pass CircleCI API token in for private projects
6b8c323 scheduler: more details in case of failure to get running builds from CircleCI
0871aff scheduler: downgrade google-api-python-client from 1.7.4 to 1.6.7
b631e7f scheduler: add GC of WKS test VMs and firewall rules
a923a32 scheduler: document setup and deployment
013f508 scheduler: lock dependencies' versions
6965a4a Merge pull request #142 from weaveworks/fix-build
23298c6 Fix golint expects import golang.org/x/lint/golint
482f4cd Context is now part of the Go standard library
2bbc9a0 Merge pull request #140 from weaveworks/sched-http-retry
c3726de Add retries to sched util http calls
2cc7b5a Merge pull request #139 from meghalidhoble/master
fd9b0a7 Change : Modified the lint tools to skip the shfmt check if not installed. Why the change : For ppc64le the specific version of shfmt is not available, hence skipped completely the installation of shfmt tool. Thus this change made.
bc645c7 Merge pull request #138 from dholbach/add-license-file
a642e02 license: add Apache 2.0 license text
9bf5956 Merge pull request #109 from hallum/master
d971d82 Merge pull request #134 from weaveworks/2018-07-03-gcloud-regepx
32e7aa2 Merge pull request #137 from weaveworks/gcp-fw-allow-kube-apiserver
bbb6735 Allow CI to access k8s API server on GCP instances
764d46c Merge pull request #135 from weaveworks/2018-07-04-docker-ansible-playbook
ecc2a4e Merge pull request #136 from weaveworks/2018-07-05-gcp-private-ips
209b7fb tools: Add private_ips to the terraform output
369a655 tools: Add an ansible playbook that just installs docker
a643e27 tools: Use --filter instead of --regexp with gcloud
b8eca88 Merge pull request #128 from weaveworks/actually-say-whats-wrong
379ce2b Merge pull request #133 from weaveworks/fix-decrypt
3b906b5 Fix incompatibility with recent versions of OpenSSL
f091ab4 Merge pull request #132 from weaveworks/add-opencontainers-labels-to-dockerfiles
248def1 Inject git revision in Dockerfiles
64f2c28 Add org.opencontainers.image.* labels to Dockerfiles
ea96d8e add information about how to get help (#129)
f066ccd Make yapf diff failure look like an error
34d81d7 Merge pull request #127 from weaveworks/golang-1.10.0-stretch
89a0b4f Use golang:1.10.0-stretch image.
ca69607 Merge pull request #126 from weaveworks/disable-apt-daily-test
f5dc5d5 Create "setup-apt" role
7fab441 Rename bazel to bazel-rules (#125)
ccc8316 Revert "Gocyclo should return error code if issues detected" (#124)
1fe184f Bazel rules for building gogo protobufs (#123)
b917bb8 Merge pull request #122 from weaveworks/fix-scope-gc
c029ce0 Add regex to match scope VMs
0d4824b Merge pull request #121 from weaveworks/provisioning-readme-terraform
5a82d64 Move terraform instructions to tf section
d285d78 Merge pull request #120 from weaveworks/gocyclo-return-value
76b94a4 Do not spawn subshell when reading cyclo output
93b3c0d Use golang:1.9.2-stretch image
d40728f Gocyclo should return error code if issues detected
c4ac1c3 Merge pull request #114 from weaveworks/tune-spell-check
8980656 Only check files
12ebc73 Don't spell-check pki files
578904a Special-case spell-check the same way we do code checks
e772ed5 Special-case on mime type and extension using just patterns
ae82b50 Merge pull request #117 from weaveworks/test-verbose
8943473 Propagate verbose flag to 'go test'.
7c79b43 Merge pull request #113 from weaveworks/update-shfmt-instructions
258ef01 Merge pull request #115 from weaveworks/extra-linting
e690202 Use tools in built image to lint itself
126eb56 Add shellcheck to bring linting in line with scope
63ad68f Don't run lint on files under .git
51d908a Update shfmt instructions
e91cb0d Merge pull request #112 from weaveworks/add-python-lint-tools
0c87554 Add yapf and flake8 to golang build image
35679ee Merge pull request #110 from weaveworks/parallel-push-errors
3ae41b6 Remove unneeded if block
51ff31a Exit on first error
0faad9f Check for errors when pushing images in parallel
d87cd02 Add arg flag override for destination socks host:port in pacfile.
74dc626 Merge pull request #108 from weaveworks/disable-apt-daily
b4f1d91 Merge pull request #107 from weaveworks/docker-17-update
7436aa1 Override apt daily job to not run immediately on boot
7980f15 Merge pull request #106 from weaveworks/document-docker-install-role
f741e53 Bump to Docker 17.06 from CE repo
61796a1 Update Docker CE Debian repo details
0d86f5e Allow for Docker package to be named docker-ce
065c68d Document selection of Docker installation role.
3809053 Just --porcelain; it defaults to v1
11400ea Merge pull request #105 from weaveworks/remove-weaveplugin-remnants
b8b4d64 remove weaveplugin remnants
35099c9 Merge pull request #104 from weaveworks/pull-docker-py
cdd48fc Pull docker-py to speed tests/builds up.
e1c6c24 Merge pull request #103 from weaveworks/test-build-tags
d5d71e0 Add -tags option so callers can pass in build tags
8949b2b Merge pull request #98 from weaveworks/git-status-tag
ac30687 Merge pull request #100 from weaveworks/python_linting
4b125b5 Pin yapf & flake8 versions
7efb485 Lint python linting function
444755b Swap diff direction to reflect changes required
c5b2434 Install flake8 & yapf
5600eac Lint python in build-tools repo
0b02ca9 Add python linting
c011c0d Merge pull request #79 from kinvolk/schu/python-shebang
6577d07 Merge pull request #99 from weaveworks/shfmt-version
00ce0dc Use git status instead of diff to add 'WIP' tag
411fd13 Use shfmt v1.3.0 instead of latest from master.
0d6d4da Run shfmt 1.3 on the code.
5cdba32 Add sudo
c322ca8 circle.yml: Install shfmt binary.
e59c225 Install shfmt 1.3 binary.
30706e6 Install pyhcl in the build container.
960d222 Merge pull request #97 from kinvolk/alban/update-shfmt-3
1d535c7 shellcheck: fix escaping issue
5542498 Merge pull request #96 from kinvolk/alban/update-shfmt-2
32f7cc5 shfmt: fix coding style
09f72af lint: print the diff in case of error
571c7d7 Merge pull request #95 from kinvolk/alban/update-shfmt
bead6ed Update for latest shfmt
b08dc4d Update for latest shfmt (#94)
2ed8aaa Add no-race argument to test script (#92)
80dd78e Merge pull request #91 from weaveworks/upgrade-go-1.8.1
08dcd0d Please ./lint as shfmt changed its rules between 1.0.0 and 1.3.0.
a8bc9ab Upgrade default Go version to 1.8.1.
31d069d Change Python shebang to `#!/usr/bin/env python`

git-subtree-dir: tools
git-subtree-split: d6cc704a2892e8d85aa8fa4d201c1a404f02dfa4

* use https if provided as scheme

* Allow region to be read from environment variable

* Fixed aws.ConfigFromURL() when URL contains @ but no user/pass

Signed-off-by: Marco Pracucci <[email protected]>

* Add provenance and license comments to files migrated from weaveworks/common, update package names

* Update imports to match new package paths.

* Remove exec, fs, test/fs and test/exec packages not used by Loki.

* Add changelog entry.

---------

Signed-off-by: Tom Wilkie <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Paul Bellamy <[email protected]>
Co-authored-by: Simon <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: Ilya Dmitrichenko <[email protected]>
Co-authored-by: Adam Harrison <[email protected]>
Co-authored-by: Jonathan Lange <[email protected]>
Co-authored-by: Alfonso Acosta <[email protected]>
Co-authored-by: Mike Lang <[email protected]>
Co-authored-by: Matthias Radestock <[email protected]>
Co-authored-by: Krzesimir Nowak <[email protected]>
Co-authored-by: lukemarsden <[email protected]>
Co-authored-by: Lorenzo Manacorda <[email protected]>
Co-authored-by: Bowen Li <[email protected]>
Co-authored-by: Jordan Pellizzari <[email protected]>
Co-authored-by: CarltonSemple <[email protected]>
Co-authored-by: Alban Crequy <[email protected]>
Co-authored-by: Filip Barl <[email protected]>
Co-authored-by: Bryan Boreham <[email protected]>
Co-authored-by: Bryan Boreham <[email protected]>
Co-authored-by: Thor <[email protected]>
Co-authored-by: Aditya C S <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants