Skip to content

Commit

Permalink
reduce to one rule tree (#731)
Browse files Browse the repository at this point in the history
  • Loading branch information
ppcad authored Dec 20, 2024
1 parent 4166d95 commit 73dbdf6
Show file tree
Hide file tree
Showing 362 changed files with 1,105 additions and 1,909 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* removed the configuration `tld_lists` in `domain_resolver`, `domain_label_extractor` and `pseudonymizer` as
the list is now fixed inside the packaged logprep
* remove SQL feature from `generic_adder`, fields can only be added from rule config or from file
* use a single rule tree instead of a generic and a specific rule tree

### Features

Expand Down
25 changes: 8 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ and secondly they specify how to process the message.
For example which fields should be deleted or to which IP-address the geolocation should be
retrieved.

For performance reasons on startup all rules per processor are aggregated to a generic and a specific rule tree, respectively.
For performance reasons on startup all rules per processor are aggregated to a rule tree.
Instead of evaluating all rules independently for each log message the message is checked against
the rule tree.
Each node in the rule tree represents a condition that has to be meet,
Expand Down Expand Up @@ -131,11 +131,6 @@ This configuration will lead to the prioritization of `tags` and `message` in th
}
```

Instead of writing very specific rules that apply to single log messages, it is also possible
to define generic rules that apply to multiple messages.
It is possible to define a set of generic and specific rules for each processor, resulting
in two rule trees.

### Connectors

Connectors are responsible for reading the input and writing the result to a desired output.
Expand Down Expand Up @@ -169,24 +164,20 @@ timeout: 0.1
pipeline:
- dissector:
type: dissector
specific_rules:
rules:
- https://your-api/dissector/
generic_rules:
- rules/01_dissector/generic/
- rules/01_dissector/rules/
- geoip_enricher:
type: geoip_enricher
specific_rules:
rules:
- https://your-api/geoip/
generic_rules:
- rules/02_geoip_enricher/generic/
- rules/02_geoip_enricher/rules/
tree_config: artifacts/tree_config.json
db_path: artifacts/GeoDB.mmdb
- dropper:
type: dropper
specific_rules:
- rules/03_dropper/specific/
generic_rules:
- rules/03_dropper/generic/
rules:
- rules/03_dropper/rules/

input:
mykafka:
Expand All @@ -213,7 +204,7 @@ output:
```
The following yaml represents a dropper rule which according to the previous configuration
should be in the `rules/03_dropper/generic/` directory.
should be in the `rules/03_dropper/rules/` directory.

```yaml
filter: "message"
Expand Down
598 changes: 299 additions & 299 deletions doc/source/development/architecture/diagramms/pipeline.drawio

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,7 @@
"processor_config = {\n",
" \"mycalculator\":{ \n",
" \"type\": \"calculator\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -223,4 +222,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,7 @@
"processor_config = {\n",
" \"myconcatenator\":{ \n",
" \"type\": \"concatenator\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -235,4 +234,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,7 @@
"processor_config = {\n",
" \"thealmightydissector\":{ \n",
" \"type\": \"dissector\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -234,4 +233,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,7 @@
"processor_config = {\n",
" \"the_field_manager\": {\n",
" \"type\": \"field_manager\",\n",
" \"specific_rules\": [\"/dev\"],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [\"/dev\"],\n",
" }\n",
"}\n"
]
Expand Down Expand Up @@ -176,9 +175,9 @@
],
"source": [
"for rule in rules:\n",
" processor._specific_tree.add_rule(rule)\n",
" processor._rule_tree.add_rule(rule)\n",
" \n",
"processor._specific_rules"
"processor._rules"
]
},
{
Expand Down Expand Up @@ -288,4 +287,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,7 @@
"processor_config = {\n",
" \"almighty generic adder\":{ \n",
" \"type\": \"generic_adder\",\n",
" \"specific_rules\": [{\"filter\": \"*\", \"generic_adder\": {\"extend_target_list\": True, \"add\": {\"message.tags\": \"New\"}} }],\n",
" \"generic_rules\": [],\n",
" \"rules\": [{\"filter\": \"*\", \"generic_adder\": {\"extend_target_list\": True, \"add\": {\"message.tags\": \"New\"}} }],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -196,4 +195,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,7 @@
"processor_config = {\n",
" \"geoip_enricher\": {\n",
" \"type\": \"geoip_enricher\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" \"db_path\": \"<INSERT_PATH_TO_GEOIP_DATABASE>\"\n",
" }\n",
"}\n"
Expand Down Expand Up @@ -191,7 +190,7 @@
"Cell \u001b[0;32mIn[24], line 5\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mlogprep\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mfactory\u001b[39;00m \u001b[39mimport\u001b[39;00m Factory\n\u001b[1;32m 4\u001b[0m mock_logger \u001b[39m=\u001b[39m mock\u001b[39m.\u001b[39mMagicMock()\n\u001b[0;32m----> 5\u001b[0m geoip_enricher \u001b[39m=\u001b[39m Factory\u001b[39m.\u001b[39;49mcreate(processor_config, mock_logger)\n\u001b[1;32m 6\u001b[0m geoip_enricher\n",
"File \u001b[0;32m~/external_work/Logprep/doc/source/development/notebooks/processor_examples/../../../../../logprep/factory.py:36\u001b[0m, in \u001b[0;36mFactory.create\u001b[0;34m(cls, configuration, logger)\u001b[0m\n\u001b[1;32m 34\u001b[0m metric_labels \u001b[39m=\u001b[39m configuration[connector_name]\u001b[39m.\u001b[39mpop(\u001b[39m\"\u001b[39m\u001b[39mmetric_labels\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 35\u001b[0m connector \u001b[39m=\u001b[39m Configuration\u001b[39m.\u001b[39mget_class(connector_name, connector_configuration_dict)\n\u001b[0;32m---> 36\u001b[0m connector_configuration \u001b[39m=\u001b[39m Configuration\u001b[39m.\u001b[39;49mcreate(\n\u001b[1;32m 37\u001b[0m connector_name, connector_configuration_dict\n\u001b[1;32m 38\u001b[0m )\n\u001b[1;32m 39\u001b[0m connector_configuration\u001b[39m.\u001b[39mmetric_labels \u001b[39m=\u001b[39m copy\u001b[39m.\u001b[39mdeepcopy(metric_labels)\n\u001b[1;32m 40\u001b[0m \u001b[39mreturn\u001b[39;00m connector(connector_name, connector_configuration, logger)\n",
"File \u001b[0;32m~/external_work/Logprep/doc/source/development/notebooks/processor_examples/../../../../../logprep/configuration.py:34\u001b[0m, in \u001b[0;36mConfiguration.create\u001b[0;34m(cls, name, config_)\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[39m\"\"\"factory method to create component configuration\u001b[39;00m\n\u001b[1;32m 20\u001b[0m \n\u001b[1;32m 21\u001b[0m \u001b[39mParameters\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 31\u001b[0m \u001b[39m the pipeline component configuration\u001b[39;00m\n\u001b[1;32m 32\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 33\u001b[0m class_ \u001b[39m=\u001b[39m \u001b[39mcls\u001b[39m\u001b[39m.\u001b[39mget_class(name, config_)\n\u001b[0;32m---> 34\u001b[0m \u001b[39mreturn\u001b[39;00m class_\u001b[39m.\u001b[39;49mConfig(\u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mconfig_)\n",
"File \u001b[0;32m<attrs generated init logprep.processor.geoip_enricher.processor.GeoipEnricher.Config>:13\u001b[0m, in \u001b[0;36m__init__\u001b[0;34m(self, type, specific_rules, generic_rules, tree_config, db_path)\u001b[0m\n\u001b[1;32m 11\u001b[0m __attr_validator_generic_rules(\u001b[39mself\u001b[39m, __attr_generic_rules, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mgeneric_rules)\n\u001b[1;32m 12\u001b[0m __attr_validator_tree_config(\u001b[39mself\u001b[39m, __attr_tree_config, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mtree_config)\n\u001b[0;32m---> 13\u001b[0m __attr_validator_db_path(\u001b[39mself\u001b[39;49m, __attr_db_path, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mdb_path)\n",
"File \u001b[0;32m<attrs generated init logprep.processor.geoip_enricher.processor.GeoipEnricher.Config>:13\u001b[0m, in \u001b[0;36m__init__\u001b[0;34m(self, type, rules, tree_config, db_path)\u001b[0m\n\u001b[1;32m 11\u001b[0m __attr_validator_generic_rules(\u001b[39mself\u001b[39m, __attr_generic_rules, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mgeneric_rules)\n\u001b[1;32m 12\u001b[0m __attr_validator_tree_config(\u001b[39mself\u001b[39m, __attr_tree_config, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mtree_config)\n\u001b[0;32m---> 13\u001b[0m __attr_validator_db_path(\u001b[39mself\u001b[39;49m, __attr_db_path, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mdb_path)\n",
"File \u001b[0;32m~/external_work/Logprep/doc/source/development/notebooks/processor_examples/../../../../../logprep/util/validators.py:53\u001b[0m, in \u001b[0;36murl_validator\u001b[0;34m(_, attribute, value)\u001b[0m\n\u001b[1;32m 51\u001b[0m \u001b[39mraise\u001b[39;00m InvalidConfigurationError(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mattribute\u001b[39m.\u001b[39mname\u001b[39m}\u001b[39;00m\u001b[39m has no schema, net location and path\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 52\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m parsed_url\u001b[39m.\u001b[39mscheme \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m parsed_url\u001b[39m.\u001b[39mnetloc \u001b[39mand\u001b[39;00m parsed_url\u001b[39m.\u001b[39mpath:\n\u001b[0;32m---> 53\u001b[0m file_validator(_, attribute, value)\n\u001b[1;32m 54\u001b[0m \u001b[39mif\u001b[39;00m parsed_url\u001b[39m.\u001b[39mscheme \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mfile\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m 55\u001b[0m \u001b[39mif\u001b[39;00m parsed_url\u001b[39m.\u001b[39mparams \u001b[39mor\u001b[39;00m parsed_url\u001b[39m.\u001b[39mquery \u001b[39mor\u001b[39;00m parsed_url\u001b[39m.\u001b[39mfragment:\n",
"File \u001b[0;32m~/external_work/Logprep/doc/source/development/notebooks/processor_examples/../../../../../logprep/util/validators.py:23\u001b[0m, in \u001b[0;36mfile_validator\u001b[0;34m(_, attribute, value)\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[39mraise\u001b[39;00m InvalidConfigurationError(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mattribute\u001b[39m.\u001b[39mname\u001b[39m}\u001b[39;00m\u001b[39m is not a str\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 22\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m os\u001b[39m.\u001b[39mpath\u001b[39m.\u001b[39mexists(value):\n\u001b[0;32m---> 23\u001b[0m \u001b[39mraise\u001b[39;00m InvalidConfigurationError(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mattribute\u001b[39m.\u001b[39mname\u001b[39m}\u001b[39;00m\u001b[39m file \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{\u001b[39;00mvalue\u001b[39m}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m does not exist\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 24\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m os\u001b[39m.\u001b[39mpath\u001b[39m.\u001b[39misfile(value):\n\u001b[1;32m 25\u001b[0m \u001b[39mraise\u001b[39;00m InvalidConfigurationError(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mattribute\u001b[39m.\u001b[39mname\u001b[39m}\u001b[39;00m\u001b[39m \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{\u001b[39;00mvalue\u001b[39m}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m is not a file\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[0;31mInvalidConfigurationError\u001b[0m: db_path file 'tests/testdata/mock_external/MockGeoLite2-City.mmdb' does not exist"
Expand Down Expand Up @@ -267,4 +266,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,7 @@
"processor_config = {\n",
" \"mygrokker\":{ \n",
" \"type\": \"grokker\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -215,4 +214,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,7 @@
"processor_config = {\n",
" \"the_ip_informer_name\":{ \n",
" \"type\": \"ip_informer\",\n",
" \"specific_rules\": [],\n",
" \"generic_rules\": [],\n",
" \"rules\": [],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -247,7 +246,7 @@
"metadata": {},
"outputs": [],
"source": [
"ip_informer._specific_tree.add_rule(rule)"
"ip_informer._rule_tree.add_rule(rule)"
]
},
{
Expand Down Expand Up @@ -403,4 +402,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,7 @@
"processor_config = {\n",
" \"almighty_keychecker\": {\n",
" \"type\": \"key_checker\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
"}\n"
]
Expand Down Expand Up @@ -336,4 +335,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,7 @@
"processor_config = {\n",
" \"myconcatenator\":{ \n",
" \"type\": \"concatenator\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,7 @@
"processor_config = {\n",
" \"cmdbrequests\":{ \n",
" \"type\": \"requester\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [],\n",
" \"rules\": [str(rule_path)],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -243,4 +242,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,7 @@
"processor_config = {\n",
" \"allmighty_string_splitter\": {\n",
" \"type\": \"string_splitter\",\n",
" \"specific_rules\": [\"/dev\"],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [\"/dev\"],\n",
" }\n",
"}\n"
]
Expand Down Expand Up @@ -167,9 +166,9 @@
],
"source": [
"for rule in rules:\n",
" processor._specific_tree.add_rule(rule)\n",
" processor._rule_tree.add_rule(rule)\n",
" \n",
"processor._specific_rules"
"processor.rules"
]
},
{
Expand Down Expand Up @@ -266,4 +265,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,7 @@
"processor_config = {\n",
" \"my_timestampdiffer\":{ \n",
" \"type\": \"timestamp_differ\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -192,4 +191,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,7 @@
"processor_config = {\n",
" \"my_timestamper\":{ \n",
" \"type\": \"timestamper\",\n",
" \"specific_rules\": [str(rule_path)],\n",
" \"generic_rules\": [\"/dev\"],\n",
" \"rules\": [str(rule_path), \"/dev\"],\n",
" }\n",
" }"
]
Expand Down Expand Up @@ -196,4 +195,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
10 changes: 4 additions & 6 deletions doc/source/development/processor_how_to.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,9 @@ This :py:class:`Config` class has to inherit from :py:class:`Processor.Config` a
- newprocessorname:
type: new_processor
specific_rules:
- tests/testdata/rules/specific/
generic_rules:
- tests/testdata/rules/generic/
rules:
- tests/testdata/rules_1/
- tests/testdata/rules_2/
new_config_parameter: config_value
"""
Expand Down Expand Up @@ -170,8 +169,7 @@ the general implementation of a new processor seen in :ref:`implementing_a_new_p
self.processor_attribute = []
self.metrics = self.NewProcessorMetrics(
labels=self.metric_labels,
generic_rule_tree=self._generic_tree.metrics,
specific_rule_tree=self._specific_tree.metrics,
rule_tree=self._rule_tree.metrics,
)
def _apply_rules(self, event, rule):
Expand Down
14 changes: 4 additions & 10 deletions doc/source/development/programaticly_start_logprep.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,8 @@ An example with input connector and preprocessors could look like this:
{
"predetector": {
"type": "pre_detector",
"specific_rules": [
"examples/exampledata/rules/pre_detector/specific"
],
"generic_rules": [
"examples/exampledata/rules/pre_detector/generic"
"rules": [
"examples/exampledata/rules/pre_detector/rules"
],
"pre_detector_topic": "output_topic"
}
Expand Down Expand Up @@ -60,11 +57,8 @@ An example without input connector and preprocessors could look like this:
{
"predetector": {
"type": "pre_detector",
"specific_rules": [
"examples/exampledata/rules/pre_detector/specific"
],
"generic_rules": [
"examples/exampledata/rules/pre_detector/generic"
"rules": [
"examples/exampledata/rules/pre_detector/rules"
],
"pre_detector_topic": "output_topic"
}
Expand Down
Loading

0 comments on commit 73dbdf6

Please sign in to comment.