-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #13 from logmanager-oss/implement-custom-anonymiza…
…tion-mappings implement custom anonymization mappings
- Loading branch information
Showing
12 changed files
with
264 additions
and
127 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,15 +26,17 @@ Usage of ./logveil: | |
-d value | ||
Path to directory with anonymizing data | ||
-i value | ||
Path to input file containing logs to be anonymized | ||
Path to input file containing logs to be anonymized (mandatory - if you don't specify input, code will fail) | ||
-o value | ||
Path to output file (default: Stdout) | ||
-c value | ||
Path to input file with custom anonymization mapping | ||
-v | ||
Enable verbose logging | ||
-e | ||
Change input file type to LM export (default: LM Backup) | ||
-p | ||
Disable proof wrtier (default: Enabled) | ||
Disable proof writer (default: Enabled) | ||
-h | ||
Help for logveil | ||
``` | ||
|
@@ -61,72 +63,102 @@ Usage of ./logveil: | |
|
||
`./logveil -d example_anon_data/ -e -i lm_export.csv -p -v` | ||
|
||
### How it works | ||
6. Read log data from LM Export file (CSV), output anonymization result to standard output (STDOUT) and load custom mapping from custom_mapping.txt | ||
|
||
**This is only a simplified example and does not match 1:1 with how anonymization is actually implemented** | ||
`./logveil -d example_anon_data/ -e -i lm_export.csv -c custom_mapping.txt` | ||
|
||
Consider below log line. It is formatted in a common `key:value` format. | ||
|
||
## Anonymization functionality | ||
|
||
There are three ways LogVeil anonymizes data: | ||
|
||
### Custom anonymization mappings | ||
|
||
You can provide custom anonymization mappings for LogVeil to use. They will take precedence over any other anonymization functionality. | ||
|
||
Custom mappings can be enabled by using flag `-c <file_path>` and must have the following format: | ||
|
||
`<original_value>:<new_value>` | ||
|
||
Each custom mapping must be separated by new line. For example: | ||
|
||
`test_custom_replacement:test_custom_replacement123`\ | ||
`replace_this:with_that`\ | ||
`test123:test1234` | ||
|
||
### Anonymization data | ||
|
||
You can also provide sets of fake data to use when anonymizing. | ||
|
||
Consider below log line: | ||
|
||
``` | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"[email protected]", "organization":"TESTuser.test.com", "mac": "71:e5:41:18:cb:3e"} | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"[email protected]", "organization":"TESTuser.test.com", "mac": "71:e5:41:18:cb:3e", "replacement_test":"replace_this"} | ||
``` | ||
|
||
First, LogVeil will load anonymization data from supplied directory (`-d example_anon_data/`). Each file in that folder should be named according to the values it will be masking. For example, lets assume we have following directory structure: | ||
If you want to anonymize values in `organization` and `username` keys, you need to have two files of the same name in anonymization data folder and enable them by using `-d <path_to_fake_data_folder>` flag. | ||
|
||
1. `username.txt` | ||
2. `organization.txt` | ||
|
||
Next, LogVeil will go over each log line in supplied input and extract `key:value` pairs from it. When applied to above log line it would look like this: | ||
Both files should contain appropriate fake data for the values they will be masking. | ||
|
||
### Regexp scanning and dynamic fake data generation | ||
|
||
1. `"@timestamp": "2024-06-05T14:59:27.000+00:00"` | ||
2. `"src_ip":"89.239.31.49"` | ||
3. `"username":"[email protected]"` | ||
4. `"organization":"TESTuser.test.com"` | ||
5. `"mac": "71:e5:41:18:cb:3e"` | ||
LogVeil implements regular expressions to look for common patterns: IP (v4, v6), Emails, MAC and URL. Once such pattern is found it is replaced with fake data generated on the fly. | ||
|
||
Then, LogVeil will try to match extracted pairs to anonymization data it loaded in previous step. Two paris should be matched: | ||
## Output | ||
|
||
1. `"src_ip":"89.239.31.49"` with `src_ip.txt` | ||
2. `"username":"[email protected]"` with `username.txt` | ||
3. `"organization":"TESTuser.test.com"` with `organization.txt` | ||
Anonymized data will be written to provided file path in txt format. Alternatively, if you don't provide output file path it will be written to the console (stdout). | ||
|
||
And one pair should be matched by regular expression scanning: | ||
Additionally LogVeil will write anonymization proof to `proof.json`, to show which values were anonymized. Proof has a following format: | ||
|
||
1. `"mac": "71:e5:41:18:cb:3e"` | ||
``` | ||
{"original":"<original_value>", "new":"<new_value>} | ||
``` | ||
|
||
Now LogVeil will grab values (randomly) from files which filenames matched with keys, generate new value for `mac` key and create a replacement map in format `"original_value":"new_value"`: | ||
## How it works | ||
|
||
1. `"89.239.31.49":"10.20.0.53"` | ||
1. `"[email protected]":"ladislav.dosek"` | ||
2. `"TESTuser.test.com":"Apple"` | ||
3. `"71:e5:41:18:cb:3e": "0f:da:68:92:7f:2b"` | ||
**This is only a simplified example and does not match 1:1 with how anonymization is actually implemented** | ||
|
||
Now each element from the above list will be iterated over and compared against log line. Whenever `original_value` is found it will be replaced with `new_value`. Outcome should look like this: | ||
Consider below log line. It is formatted in a common `key:value` format. | ||
|
||
``` | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"10.20.0.53", "username":"ladislav.dosek", "organization":"Apple", "mac": "0f:da:68:92:7f:2b"} | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"89.239.31.49", "username":"[email protected]", "organization":"TESTuser.test.com", "mac": "71:e5:41:18:cb:3e", "replacement_test":"replace_this"} | ||
``` | ||
|
||
``` | ||
{"original": "27.221.126.209", "new": "10.20.0.53"}, | ||
"{"original":"[email protected]","new":"ladislav.dosek"}" | ||
"{"original":"TESTuser.test.com","new":"Apple"}" | ||
{"original": "71:e5:41:18:cb:3e", "new": "0f:da:68:92:7f:2b"}, | ||
``` | ||
First, LogVeil will load anonymization data from supplied directory (`-d example_anon_data/`). Each file in that folder should be named according to the values it will be masking. For example, lets assume we have following directory structure: | ||
|
||
### Anonymization data | ||
1. `username.txt` | ||
2. `organization.txt` | ||
|
||
Second, if available, LogVeil will load the custom anonymization mapping from user supplied path. For example, assume we have following file `custom_mapping.txt` with below content: | ||
|
||
1. `test_custom_replacement:test_custom_replacement123` | ||
2. `replace_this:with_that` | ||
3. `test123:test1234` | ||
|
||
Each `key:value` pair which you want to anonymize data must have its equivalent in anonymization data folder. | ||
Now anonymization process can start. LogVeil will grab log lines from supplied input, one by one, and apply anonymization to it three steps: | ||
|
||
If anonymization data does not exist for any given `key:value` pair then LogVeil will attempt to use regular expressions to match and replace common values such as: IPv4, IPv6, MAC, Emails and URLs. | ||
1. Replace values based on custom anonymization mapping | ||
2. Replace values based on loaded anonymization data | ||
3. Replace values based on regular expression matching and fake data generation | ||
|
||
For example, if you want to anonymize values in `organization` and `username` keys, you need to have two files of the same name in anonymization folder containing some random data. | ||
Final output should look like this: | ||
|
||
### Output | ||
``` | ||
{"@timestamp": "2024-06-05T14:59:27.000+00:00", "src_ip":"10.20.0.53", "username":"ladislav.dosek", "organization":"Apple", "mac": "0f:da:68:92:7f:2b", "replacement_test":"with_that"} | ||
``` | ||
|
||
Anonymized data will be outputted to provided file path in txt format. | ||
And anonymization proof: | ||
|
||
Alternatively, if you don't provide file path, output will be written to the console. | ||
``` | ||
{"original":"replace_this", "new":"with_that"} | ||
{"original": "27.221.126.209", "new": "10.20.0.53"}, | ||
{"original":"[email protected]","new":"ladislav.dosek"}, | ||
{"original":"TESTuser.test.com","new":"Apple"}, | ||
{"original": "71:e5:41:18:cb:3e", "new": "0f:da:68:92:7f:2b"}, | ||
``` | ||
|
||
## Release | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,32 +12,35 @@ import ( | |
|
||
func TestAnonimizer_AnonymizeData(t *testing.T) { | ||
tests := []struct { | ||
name string | ||
anonymizingDataDir string | ||
input map[string]string | ||
expectedOutput string | ||
name string | ||
anonymizationDataDir string | ||
customAnonymizationMappingPath string | ||
input map[string]string | ||
expectedOutput string | ||
}{ | ||
{ | ||
name: "Test AnonymizeData", | ||
anonymizingDataDir: "../../tests/data/anonymization_data", | ||
name: "Test AnonymizeData", | ||
anonymizationDataDir: "../../tests/data/anonymization_data", | ||
customAnonymizationMappingPath: "../../tests/data/custom_mappings.txt", | ||
input: map[string]string{ | ||
"@timestamp": "2024-06-05T14:59:27.000+00:00", | ||
"src_ip": "10.10.10.1", | ||
"src_ipv6": "7f1d:64ed:536a:1fd7:fe8e:cc29:9df4:7911", | ||
"mac": "71:e5:41:18:cb:3e", | ||
"email": "test@test.com", | ||
"email": "atest@atest.com", | ||
"url": "https://www.testurl.com", | ||
"username": "miloslav.illes", | ||
"organization": "Microsoft", | ||
"raw": "2024-06-05T14:59:27.000+00:00, 10.10.10.1, 7f1d:64ed:536a:1fd7:fe8e:cc29:9df4:7911, miloslav.illes, Microsoft, 71:e5:41:18:cb:3e, [email protected], https://www.testurl.com", | ||
"custom:": "replacement_test", | ||
"raw": "2024-06-05T14:59:27.000+00:00, 10.10.10.1, 7f1d:64ed:536a:1fd7:fe8e:cc29:9df4:7911, miloslav.illes, Microsoft, 71:e5:41:18:cb:3e, [email protected], https://www.testurl.com, replace_this", | ||
}, | ||
expectedOutput: "2024-06-05T14:59:27.000+00:00, 10.20.0.53, 8186:39ac:48a4:c6af:a2f1:581a:8b95:25e2, ladislav.dosek, Apple, 0f:da:68:92:7f:2b, [email protected], http://soqovkq.com/NfkcUjG.php", | ||
expectedOutput: "2024-06-05T14:59:27.000+00:00, 10.20.0.53, 8186:39ac:48a4:c6af:a2f1:581a:8b95:25e2, ladislav.dosek, Apple, 0f:da:68:92:7f:2b, [email protected], http://soqovkq.com/NfkcUjG.php, with_that", | ||
}, | ||
} | ||
|
||
for _, tt := range tests { | ||
t.Run(tt.name, func(t *testing.T) { | ||
anonymizer, err := CreateAnonymizer(&config.Config{AnonymizationDataPath: tt.anonymizingDataDir}, &proof.ProofWriter{IsEnabled: false}) | ||
anonymizer, err := CreateAnonymizer(&config.Config{AnonymizationDataPath: tt.anonymizationDataDir, CustomAnonymizationMappingPath: tt.customAnonymizationMappingPath}, &proof.ProofWriter{IsEnabled: false}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.