Do not obfuscate outputs by a single regex set #64

Glutexo · 2018-07-18T13:30:46Z

Currently command outputs are processed by a SED script to obfuscate passwords. The replacement is done by a set of regular expressions without any parsing or format consideration. (Recently addressed by #63.)

This is definitely not a good practice: There are many commands that are run by the specs. Each of the command has its own output format. The output format doesn’t have to be plain-text. Passwords are not the only sensitive piece of information worth obfuscating. The passwords/tokens/etc. can consists of different character sets; they can have different boundaries depending on the output format. It is not possible to write a universal regular expression that would match and replace such information in any output so it would be safe enough and not yield many false positives at the same time. False positives can in the end lead to unprocessable data.

I suggest making the obfuscation Spec-specific. A spec would have to know the format of the file or command output and parse it at least to the extent that all sensitive data could be distinguishable. Then an obfuscation can take place, minding the format or encoding of the payload.

Passwords can often contain Unicode characters, whitespace etc. This often requires some escaping or specific rules to find its boundaries. This is not only difficult by itself, but using regular expressions that are affected by various locale specifics can very quickly lead to code that is buggy and unmaintainable.

@kylape pointed out that soscleaner can be used for this. Its job is actually cleaning sensitive data from sosreports. Another option would be to write a solution for this from scratch.

I suppose that this has to be done before uploading the data to the API. That means some logic duplication on the parsing side, which can ultimately lead to an idea of sending parsed structured data instead of actual payloads. If such extreme is not desirable, parsing on the client side can be simple, not going beyond a few regular expressions and format transpilings.

Some examples:

If a password is in URI query format, plain-text matching can easily consume the whole rest of the query: username=user&password=pass&remember=1 becomes username=user&password=********. For the URI to remain valid by [RFC 3986], * should be escaped as %2A.
If a password is in a JSON, XML or CSV format, it is not obfuscated at all. {"password":"pass"}, <auth password="pass" /> or 1;user;pass;something all remain untouched. Both formats require some special character escaping.

Generally speaking, sensitive data can appear anywhere and in any format. Naïve approach to filtering would inadvertently lead to leakages, false positives and data loss and corruption.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not obfuscate outputs by a single regex set #64

Do not obfuscate outputs by a single regex set #64

Glutexo commented Jul 18, 2018 •

edited

Loading

Do not obfuscate outputs by a single regex set #64

Do not obfuscate outputs by a single regex set #64

Comments

Glutexo commented Jul 18, 2018 • edited Loading

Glutexo commented Jul 18, 2018 •

edited

Loading