feat: Implement basic linting #282

wxsBSD · 2025-01-09T14:53:27Z

Apologies as this PR is a little bigger than I wanted, partially due to fixing a bug in config file handling and partially because it implements two linter checks (required metadata and rule name regexp).

Config file handling improvements

Prior to this change if you had an invalid config file in your home dir we would just silently replace it with the default config, which is unexpected from a user standpoint (invalid configs should produce an error and refuse to run). To deal with this we now pass errors back up in the main command instead of using .unwrap_or_default().

Also, rather than have each sub-command implement config file handling I moved it into main, to remove duplicate code. Also, because we are now propagating errors if you have an invalid config in ${HOME}/.yara-x.toml I improved error messages so it is clear what file failed to parse:

Error: Unable to parse /Users/wxs/.yara-x.toml: invalid type: found signed int `0`, expected a boolean for key "default.fmt.patterns.align_values" in TOML source string

Linting checks

To implement linting checks for required metadata and rule name format I chose to implement them in the compiler directly. I initially had an implementation with a stand-alone Linter type that worked on the AST from the parser. While this worked it turned out that getting it to use the existing Report/ReportBuilder mechanism was exceedingly intrusive as reporting is coupled too closely to the compiler (or at least I couldn't figure out how to make it work easily). Because of this I just went with implementing it directly in the compiler, which I think is also a good idea because we now get warnings anywhere the compiler is used, which can be quite beneficial for enforcing standards.

To use these linting checks I added a section in the config file and exposed the options for it there. See the config guide changes for details, but the TLDR is you can now specify a table (TOML term for a dictionary) where the key is the metadata identifier that is required and the value is the type that is required. Any rule that does not meet these requirements will get warnings from the compiler.

The other check is a regex that must match the rule name. Again, this is specified in the config file and warnings are issued by the compiler for any rule that does not match this regex.

The default value for both of these options is off so by default the compiler does not enforce any particular standards but you can opt in to it for linting if you want. The only place that currently sets these options is the yr check command.

If you're happy with these changes I'd like to also expose them in the various bindings so users can get lint warnings in the language of their choice.

This implements a basic linter into the compiler that can check for required metadata fields and types. I chose to implement it into the compiler so that we can get these checks for free across any tool/binding that uses the compiler directly. Also, it was easier to get nicer reporting by building it into the compiler. This is an optional feature that you can turn on in the compiler using the "required_metadata" method on the compiler. If you do this the compiler will issue warnings if any metadata is missing or if the type is incorrect. I've provided all the basic types (string, int, bool, float) and also some convenience types like md5, sha1, sha256 and hash to make it a bit more flexible for users. Extending these with more convenience types is an easy exercise too.

Prior to this change if you had an invalid config file (ie: using the wrong type for an option) we would just silently replace it with the default config, which made debugging way more difficult than it should have been. We now pass errors back up to the caller if the error is a because the config file can not be parsed. Errors now look like this: ``` Error: invalid type: found signed int `0`, expected a boolean for key "default.fmt.patterns.align_values" in .yara-x.toml TOML file ``` Also, because we now display errors like the above it doesn't make sense to have each command have a -C option. If the config file in ${HOME}/.yara-x.toml is improperly formatted we would try to parse it and display an error even before we called the subcommand. This meant if you had an invalid config file in the default location and used "yr fmt -C /tmp/foo ..." you would see the error from your default config file despite the -C argument. To deal with this the -C argument was removed from each command and moved into the main one. It now loads the default config if, and only if, you do not specify one with the -C argument.

Also a bunch of other smaller changes that got lumped in here.

Move file io out of config helper function and generate better error messages when the file can't be parsed.

docs/YARA-X Config Guide.md

rule_name_regexp now returns a Result. This makes error handling in the case of an invalid (or too large) regex easier.

Now that I have a working python environment I can finally fix up these tests.

…doesn't exist In that case the program was failing instead of using the default configuration. Now this case works correctly, while at the same time it raises an error if `--config` is specified with a non-existing path.

Instead of adding multiple new methods to the `Compiler` object for each type of check, add a single `add_linter` method that takes a type that implements the `Linter` trait. Each check is implemented as a new type that implements this type and put under `yara_x::linters`. With this change the compiler's API is not bloated with multiple methods and we can introduce more linters in the future with more complicated APIs.

plusvic · 2025-01-29T15:07:08Z

I wasn't convinced about the proposed API for a number of reasons:

A new method was added to the Compiler for each type of check. If we add some other checks in the future this will bloat the compiler's API with multiple methods that are not really related to the compiler's main feature which is compiling.
As each check is implemented directly in the compiler, the compiler's code is also more complicated, not only its API.
Passing a BTreeMap type as part of the API exposes an implementation detail.
The yara_x crate didn't look like a good place to put the MetaValueType type.

I've refactored the API to make it more flexible, adding a single method to the compiler (Compiler::add_linter). This method receives any type that implements the Linter trait. New types linters::RuleNameMatches and linters::RequiredMetadata implement this trait. This allows adding many more checks in the future without bloating the Compiler object. It also allows creating more complex linters with ergonomic APIs.

wxsBSD added 4 commits January 7, 2025 15:43

Implement rule name regex.

5616581

Also a bunch of other smaller changes that got lumped in here.

Config handling fixes.

e3a50e1

Move file io out of config helper function and generate better error messages when the file can't be parsed.

wxsBSD changed the title ~~Implement basic linting~~ feat: Implement basic linting Jan 9, 2025

xorhex reviewed Jan 9, 2025

View reviewed changes

docs/YARA-X Config Guide.md Outdated Show resolved Hide resolved

wxsBSD and others added 19 commits January 9, 2025 11:27

Fix typo in docs. Noticed by @xorhex.

ef04f71

API changes and add linting to python API.

92dd0c7

rule_name_regexp now returns a Result. This makes error handling in the case of an invalid (or too large) regex easier.

Merge branch 'main' into check_lint

e38db27

Merge branch 'main' into check_lint

5e9be53

Merge branch 'main' into check_lint

129282e

Fix incorrect merge.

a32ee86

Merge branch 'main' into check_lint

9ffb2aa

Fix broken pytests.

612558b

Now that I have a working python environment I can finally fix up these tests.

Add typing information.

844a15f

Merge branch 'main' into check_lint

1bab92c

refactor: simplify the logic for reading the config file.

5bb398e

refactor: rename fields in FormatConfig for better naming consistency.

d65b14f

fix: take into account that the home directory can be an empty string.

50c6edc

fix: remove println

db35a3c

fix: show config file errors with the same style than every other error.

2a9184b

fix: clippy warnings

8306115

fix: wrong condition introduced in previous commit.

0bb2482

tests: add test case.

c909252

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement basic linting #282

feat: Implement basic linting #282

wxsBSD commented Jan 9, 2025

plusvic commented Jan 29, 2025

feat: Implement basic linting #282

Are you sure you want to change the base?

feat: Implement basic linting #282

Conversation

wxsBSD commented Jan 9, 2025

Config file handling improvements

Linting checks

plusvic commented Jan 29, 2025