Skip to content

Commit

Permalink
prometheus_data_dump-9 Add more filters (#10)
Browse files Browse the repository at this point in the history
issue #9
  • Loading branch information
pokornyIt authored Dec 9, 2022
1 parent 71134e0 commit ab5c941
Show file tree
Hide file tree
Showing 9 changed files with 543 additions and 311 deletions.
65 changes: 47 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,28 @@
[![GitHub tag (latest SemVer)](https://img.shields.io/github/v/tag/pokornyit/prometheus_data_dump?label=latest)](https://github.com/pokornyIt/prometheus_data_dump/releases/latest)

# Prometheus Data Dump

Project designed to export data from the Prometheus database.
Exports are intend for further processing in other systems that do not support
Exports are intend for further processing in other systems that do not support
direct integration to the Prometheus system as a data source.

Each Prometheus metric is export to a separate file.
The data is export for a defined number of days back and can be limited to selected "jobs".
Existing exported data is overwrite by the new export.
A special file "metrics-meta.json" is exported, which contains a description of individual metrics.
Each Prometheus metric is export to a separate file.
The data is export for a defined number of days back and can be limited to selected "jobs".
Existing exported data is overwritten by the new export.
A special file "metrics-meta.json" is exported, which contains a description of individual metrics.

# Program start

The program requires the entry of selected configuration parameters for its start.
The program requires the entry of selected configuration parameters for its start.
This is mainly the address of the Prometheus server from which the data will be exported.

## Configuration file

Program has config file.

```yaml
server: prometehus.server
server: prometheus.server
port: 9090
path: ./dump
days: 2
from: "2021-02-01 10:30"
Expand All @@ -32,32 +35,55 @@ storeDirect: true
sources:
- instance: 'localhost.+'
includeGo: false
labels:
- name: 'node'
value: 'my.node.+'
excludeMetrics: '^node_.*'
```
- **server** - FQDN or IP address of prometheus server
- **port** - API port default is 9090 (not required)
- **path** - Path for store export data
- **days** - Number of day to exports (1-60)
- **from** - From date and time dump data
- **to** - To date and time dump data
- **step** - Step for time slice in seconds (5 - 3600), default 10
- **storeDirect** - Store dump data direct to path or create inside path new subdirectory. Subdirectory name is *yyyyMMdd-HH:mm*
- **days** - Number of day to exports (1-60) (not required)
- **from** - From date and time dump data
- **to** - To date and time dump data
- **step** - Step for time slice in seconds (5 - 3600), default 10 (not required)
- **storeDirect** - Store dump data direct to path or create inside path new subdirectory. Subdirectory name is
*yyyyMMdd-HH:mm*
- **sources** - Array for limit data to only for instance list name.
- **instance** - Instance name for what you can export all data
- **excludeGo** - Include metrics name starts with '*go_*'. Default mean exclude this metrics
- **instance** - Instance name for what you can export all data
- **excludeGo** - Include metrics name starts with '*go_*'. Default mean exclude this metrics
- **labels** - Array of labels with its values that load into subdirectory labels
- **name** - Label name
- **value** - Label values
- **excludeMetrics** - Exclude metrics names match GO Lang regex, if regex not valid omitted (not required)
A values in the **labels** select all metrics contains one of definition (OR function).
The **excludeMetrics** are applied to all metrics selected with defined labels.
If **from** and **to** values are defined, the **days** value is ignored.
If **from** and **to** values are defined, the **days** value is ignored.
Valid formats for **form** and **to** value:
- YYYY-mm-dd HH:MM
- YYYY-mm-dd HH:MM:ss
- YYYY-mm-ddTHH:MM
- YYYY-mm-ddTHH:MM:ss
## Configuration line parameters
- **--config.show** - show actual configuration and exit
- **--config.file=cfg.yml** - define config file, default is cfg.yml
- **--path=./dump** - overwrite the path defined in config file
- **--server=IP** - FQDN or IP address of prometheus server
- **--port=11102** - API port, default value is 9090
- **--from=date** - From date and time dump data, overwrite value in config file
- **--to=date** - To date and time dump data, overwrite value in config file
- **--back=days** - Number of day to export from now back, overwrite value in config file
## Example start program
Program run with all configuration from config file named "all-in.json":
```shell
./prometheus_data_dump --config.file=all-in.yml

Expand All @@ -66,23 +92,26 @@ Program run with all configuration from config file named "all-in.json":
```

Program show actual configuration:

```shell
./prometheus_data_dump --config.file=all-in.yml --config.show

# short version
./prometheus_data_dump -c all-in.yml -v
```

Program run with overwrite configuration data:
Program run with overwrite configuration data:

```shell
./prometheus_data_dump --config.file "all-in.yml" --path "/tmp/dump" --from "2021-02-18 10:00" --to "2021-02-19 12:00" --server=c01.server.com --log.level=debug

# short version
./prometheus_data_dump -c all-in.yml -p "/tmp/dump" -f "2021-02-18 10:00" -t "2021-02-19 12:00" -s c01.server.com --log.level=debug
```


# Contribute

We welcome any contributions. Please fork the project on GitHub and open Pull Requests for any proposed changes.

Please note that we will not merge any changes that encourage insecure behaviour. If in doubt please open an Issue first to discuss your proposal.
Please note that we will not merge any changes that encourage insecure behaviour. If in doubt please open an Issue first
to discuss your proposal.
47 changes: 47 additions & 0 deletions api-reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,53 @@ func collectSeriesList(v1api v1.API, sources Sources, dateRange v1.Range) (label
return labels, nil
}

func collectLabelsSeriesList(v1api v1.API, lbl []Labels, dateRange v1.Range) (labels []model.LabelSet, err error) {
_ = level.Debug(logger).Log("msg", fmt.Sprintf("entry collect series data for instance labels"))
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
var filter []string
var re []*regexp.Regexp
msg := ""
separator := ""
for _, l := range lbl {
instances := fmt.Sprintf("{%s=~\"%s\"}", l.Label, l.Value)
filter = append(filter, instances)
_ = level.Debug(logger).Log("msg", fmt.Sprintf("instance filter: %s", instances))
msg = fmt.Sprintf("%s%s%s=~\"%s\"", msg, separator, l.Label, l.Value)
separator = ", "
if len(l.ExcludeMetrics) > 0 {
r, err := regexp.Compile(l.ExcludeMetrics)
if err == nil {
re = append(re, r)
}
}
}

dataSet, warnings, err := v1api.Series(ctx, filter, dateRange.Start, dateRange.End)
if err != nil {
_ = level.Error(logger).Log("msg", "problem query Prometheus API", "error", err)
return nil, err
}
if len(warnings) > 0 {
_ = level.Warn(logger).Log("msg", "Prometheus API return warning", "warn", err)
}
labels = []model.LabelSet{}
for _, set := range dataSet {
allowed := true
for _, r := range re {
if !r.Match([]byte(set[LabelName])) {
allowed = false
break
}
}
if allowed {
labels = append(labels, set)
}
}
_ = level.Debug(logger).Log("msg", fmt.Sprintf("collect %d from %d series for instance %s", len(labels), len(dataSet), msg))
return labels, nil
}

func readQueryRange(api v1.API, labelSet model.LabelSet, timeRange v1.Range) (data model.Value, err error) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
Expand Down
99 changes: 80 additions & 19 deletions config.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,48 +10,66 @@ import (
"os"
"path/filepath"
"regexp"
"strings"
"time"
)

//const connectionTimeout = 10
// const connectionTimeout = 10
const (
TimeFormat = "2006-01-02 15:04"
DefaultDataPath = "./dump"
DefaultPort = 9090
MinPort = 1025
MaxPort = 65535
)

type Configuration struct {
Server string `yaml:"server" json:"server"` // FQDN or IP address of server
Path string `yaml:"path" json:"path"` // path to store directory
Days int `yaml:"days" json:"days"` // days back to dump
From string `yaml:"from" json:"from"` // date from in yyyy-mm-dd HH:MM format
To string `yaml:"to" json:"to"` // date to in yyyy-mm-dd HH:MM format
Sources []Sources `yaml:"sources" json:"sources"` // list of collected sources
Step int `yaml:"step" json:"step"` // step data dump 5 - 3600 sec
StoreDirect bool `yaml:"storeDirect" json:"storeDirect"` // don't create subdirectory with store time
Server string `yaml:"server" json:"server"` // FQDN or IP address of server
Port uint `yaml:"port,omitempty" json:"port,omitempty"` // API port if not defined
Path string `yaml:"path" json:"path"` // path to store directory
Days int `yaml:"days,omitempty" json:"days,omitempty"` // days back to dump
From string `yaml:"from,omitempty" json:"from,omitempty"` // date from in yyyy-mm-dd HH:MM format
To string `yaml:"to,omitempty" json:"to,omitempty"` // date to in yyyy-mm-dd HH:MM format
Step int `yaml:"step,omitempty" json:"step,omitempty"` // step data dump 5 - 3600 sec
StoreDirect bool `yaml:"storeDirect,omitempty" json:"storeDirect,omitempty"` // don't create subdirectory with store time
Sources []Sources `yaml:"sources,omitempty" json:"sources,omitempty"` // list of collected sources
Labels []Labels `yaml:"labels,omitempty" json:"labels,omitempty"` // label/value pairs for select data
}

type Sources struct {
Instance string `yaml:"instance" json:"instance"` // instance names uses wildcards .+ mean all
IncludeGo bool `yaml:"includeGo" json:"includeGo"` // exclude standard go_ metrics (__name__)
Instance string `yaml:"instance,omitempty" json:"instance,omitempty"` // instance names uses wildcards .+ mean all
IncludeGo bool `yaml:"includeGo,omitempty" json:"includeGo,omitempty"` // exclude standard go_ metrics (__name__)
}

type Labels struct {
Label string `yaml:"label" json:"label"` // label name
Value string `yaml:"value" json:"value"` // label value include wildcards .+
ExcludeMetrics string `yaml:"excludeMetrics,omitempty" json:"excludeMetrics,omitempty"` // exclude metrics name GOLang regex if omitted get all
}

//type Sources map[string]string // define sources

var (
showConfig = kingpin.Flag("config.show", "Show actual configuration and ends").Short('v').Default("false").Bool()
configFile = kingpin.Flag("config.file", "Configuration file default is \"cfg.yml\".").Short('c').PlaceHolder("cfg.yml").Default("cfg.yml").String()
configFile = kingpin.Flag("config.file", "Configuration file default is \"cfg.yml\".").Short('c').PlaceHolder("cfg.yml").Default("cfg.yml").ExistingFile()
directoryData = kingpin.Flag("path", "Path where store export json data").Short('p').PlaceHolder("path").Default(DefaultDataPath).String()
server = kingpin.Flag("server", "Prometheus server FQDN or IP address").Short('s').PlaceHolder("server").Default("").String()
from = kingpin.Flag("from", "Start datetime for export data").Short('f').PlaceHolder("yyyy-mm-dd HH:MM").Default("").String()
to = kingpin.Flag("to", "End datetime for export data").Short('t').PlaceHolder("yyyy-mm-dd HH:MM").Default("").String()
back = kingpin.Flag("back", "Export data back from now").Short('b').Default("0").Int()
config = &Configuration{
port = kingpin.Flag("port", fmt.Sprintf("Prometheus server API port (number between %d and %d", MinPort, MaxPort)).
Short('n').PlaceHolder("9090").Default("0").Uint()
from = kingpin.Flag("from", "Start datetime for export data").Short('f').PlaceHolder("yyyy-mm-dd HH:MM").Default("").String()
to = kingpin.Flag("to", "End datetime for export data").Short('t').PlaceHolder("yyyy-mm-dd HH:MM").Default("").String()
back = kingpin.Flag("back", "Export data back from now").Short('b').Default("0").Int()
config = &Configuration{
Server: "",
Port: DefaultPort,
Path: DefaultDataPath,
Days: 1,
Sources: []Sources{},
Step: 10,
From: "",
To: "",
StoreDirect: false,
Sources: []Sources{},
Labels: []Labels{},
}
configFrom time.Time = time.Date(1970, 01, 01, 0, 0, 0, 0, time.UTC)
configTo time.Time = time.Date(1970, 01, 01, 0, 0, 0, 0, time.UTC)
Expand Down Expand Up @@ -98,6 +116,9 @@ func (c *Configuration) overWriteFromLine() {
if len(*server) > 0 {
c.Server = *server
}
if *port > 0 {
c.Port = *port
}
if len(*directoryData) > 0 && *directoryData != DefaultDataPath {
c.Path = *directoryData
}
Expand All @@ -120,6 +141,9 @@ func (c *Configuration) validate() error {
return errors.New("defined Prometheus server address isn't valid FQDN or IP address")
}
}
if !(c.Port >= MinPort && c.Port <= 65536) {
return fmt.Errorf("defined port is out of range %d - %d", MinPort-1, MaxPort+1)
}
if len(c.Path) < 1 {
c.Path = DefaultDataPath
}
Expand Down Expand Up @@ -174,6 +198,14 @@ func (c *Configuration) validate() error {
if len(c.Sources) < 1 {
return errors.New("not define any sources")
}
if len(c.Labels) > 1 {
for _, label := range c.Labels {
err = label.validate()
if err != nil {
return err
}
}
}
return nil
}

Expand All @@ -197,7 +229,7 @@ func (c *Configuration) LoadFile(filename string) error {

func (c *Configuration) print() string {
a := fmt.Sprintf("\r\n%s\r\nActual configuration:\r\n", applicationName)
a = fmt.Sprintf("%sServer: [%s]\r\n", a, c.Server)
a = fmt.Sprintf("%sServer: [%s:%d]\r\n", a, c.Server, c.Port)
a = fmt.Sprintf("%sData path: [%s]\r\n", a, c.Path)
if c.useRange() {
a = fmt.Sprintf("%sFrom: [%s]\r\n", a, configFrom.Format(time.RFC3339))
Expand All @@ -213,11 +245,19 @@ func (c *Configuration) print() string {
a = fmt.Sprintf("%s [%s]\r\n", a, source.print())
}
}
if len(c.Labels) == 0 {
a = fmt.Sprintf("%sLabels: [N/A]\r\n", a)
} else {
a = fmt.Sprintf("%sLabels:\r\n", a)
for _, l := range c.Labels {
a = fmt.Sprintf("%s [%s]\r\n", a, l.print())
}
}
return a
}

func (c *Configuration) serverAddress() string {
return fmt.Sprintf("http://%s:9090", c.Server)
return fmt.Sprintf("http://%s:%d", c.Server, c.Port)
}

func (c *Configuration) useRange() bool {
Expand All @@ -228,3 +268,24 @@ func (s *Sources) print() string {
a := fmt.Sprintf("%s (%t)", s.Instance, s.IncludeGo)
return a
}

func (l *Labels) print() string {
a := fmt.Sprintf("%s=~\"%s\" (%t)", l.Label, l.Value, len(l.ExcludeMetrics) > 0)
return a
}

func (l *Labels) validate() error {
if len(l.Label) == 0 {
return fmt.Errorf("label must be defined")
}
if strings.Compare(strings.ToLower(l.Label), "instance") == 0 {
return fmt.Errorf("label can't be \"instance\"")
}
if strings.Contains(l.Label, "+") {
return fmt.Errorf("label must be defined without wildcards")
}
if len(l.Value) == 0 {
return fmt.Errorf("value for lable [%s] must be defined", l.Label)
}
return nil
}
13 changes: 9 additions & 4 deletions config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ func TestConfig_overWriteFromLine(t *testing.T) {
func TestConfig_validate(t *testing.T) {
type fields struct {
Server string
Port uint
Path string
Days int
Sources []Sources
Expand All @@ -60,18 +61,22 @@ func TestConfig_validate(t *testing.T) {
fields fields
wantErr bool
}{
{"all valid", fields{"server.local", "./", 2, []Sources{{"a", true}}, 10}, false},
{"no server", fields{"", "./", 2, nil, 10}, true},
{"no path", fields{"", "", 2, nil, 10}, true},
{"wrong day", fields{"", "./", -2, nil, 10}, true},
{"all valid", fields{"server.local", 11102, "./", 2, []Sources{{"a", true}}, 10}, false},
{"no server", fields{"", 9090, "./", 2, nil, 10}, true},
{"no path", fields{"", 9090, "", 2, nil, 10}, true},
{"wrong day", fields{"", 9090, "./", -2, nil, 10}, true},
{"part under 1025", fields{"", 1024, "./", -2, nil, 10}, true},
{"part above 65535", fields{"", 65536, "./", -2, nil, 10}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
c := &Configuration{
Server: tt.fields.Server,
Port: tt.fields.Port,
Path: tt.fields.Path,
Days: tt.fields.Days,
Sources: tt.fields.Sources,
Labels: []Labels{},
Step: tt.fields.Step,
}
if err := c.validate(); (err != nil) != tt.wantErr {
Expand Down
Loading

0 comments on commit ab5c941

Please sign in to comment.