Skip to content
Mike Perham edited this page Jun 6, 2024 · 3 revisions

Faktory Enterprise can emit real-time metrics to Statsd for monitoring and alerting.

Statsd

In /etc/faktory/conf.d/statsd.toml, add the following section to enable statsd metrics:

[statsd]
  # required, location of the statsd server
  location = "hostname:port"

  # Prepend all metric names with this value, defaults to 'faktory.'
  # If you have multiple Faktory servers for multiple apps reporting to
  # the same statsd server you can use a multi-level namespace, 
  # e.g. "app1.faktory.", "app2.faktory." or use a tag below.
  #namespace = "faktory."

  # optional, DataDog-style tags to send with each metric.
  # keep in mind that every tag is sent with every metric so keep tags short.
  #tags = ["env:production", "region:us-east-1a"]

  # Statsd client will buffer metrics for 100ms or until this size is reached.
  # The default value of 15 tries to avoid UDP packet sizes larger than 1500 bytes.
  # If your network supports jumbo UDP packets, you can increase this to ~50.
  #bufferSize = 15
  
  # Calculate the queue latency for this set of queues also
  queueLatency = ["critical", "default"]

Tags must conform to DataDog's specifications:

  • tag must match "\A[a-zA-Z][\w\-\:\.\/]*\z"
  • host, device, source, and service are reserved words

Metrics

Global

Faktory sends global metrics, similar to those seen in the Web UI, every 30 seconds.

Name Type Description
processed Gauge Total number of jobs processed (success = processed - failures)
failures Gauge Total count of failed job executions
scheduled Gauge Current number of scheduled jobs
retries Gauge Current number of jobs to be retried
dead Gauge Current number of Dead jobs
busy Gauge Current number of jobs being processed
ops.connections Gauge Faktory client network connections
ops.commands Gauge Client commands processed by Faktory
ops.memory Gauge Faktory RAM usage, in bytes
ops.redis.connections Gauge Redis network connections
ops.redis.memory Gauge Redis RAM usage, in bytes
enqueued Gauge Total count of all jobs within queues
enqueued.{name} Gauge Size of {name} queue

Global metrics are tagged only with the set of tags configured in TOML above. Note that queues hold jobs that are ready to execute now. Scheduled, Retries, and Dead jobs are not enqueued.

Latency

Queue latency is a more expensive operation: Faktory has to peek at the first job, parse the JSON and check the enqueued_at element. Because of this additional expense, Faktory will only gather queue latency for queues that you opt-into:

# statsd.toml
[statsd]
  queueLatency = ["default", "bulk"]
Name Type Description Tags
latency.{queue} Gauge The time in seconds between now and when the first job in queue was enqueued n/a

Job Execution

Basic job execution metrics are sent in real-time as jobs are processed.

Name Type Description Tags
jobs.count Counter Total job execution count, increments upon ACK/FAIL "queue:{queue}", "jobtype:{type}"
jobs.failed Counter Total job failure count, increments upon FAIL "queue:{queue}", "jobtype:{type}"
jobs.perform Gauge (time) Time between FETCH and ACK "queue:{queue}", "jobtype:{type}"
Clone this wiki locally