METRON-2261 Isolate Curator Dependencies #1515

nickwallen · 2019-09-19T15:33:03Z

As part of the HDP 3.1 upgrade, we need to upgrade to Curator 4.x. There was a discuss thread covering the need for this.

Currently, Curator is being pulled in as a transitive of Hadoop, HBase and Storm. At the current version levels of these dependencies this does not cause a problem. But when upgrading Hadoop, different conflicting versions of Curator will get pulled in. We need to ensure that only a single version of Curator is pulled in after upgrading Hadoop.

This change is preparing us for follow-on PRs including...

Changes

This change will maintain the current versions in use including Curator 2.7.1 and Zookeeper 3.4.6 (see note 1).
Defines Zookeeper as a direct dependency instead of relying on it being pulled in as a transitive of Curator (see note 2).
Defines Curator as a direct dependency instead of relying on it being pulled in as a transitive.
Introduces a separate ${global_curator_test_version} to allow the Curator test dependencies to differ from the main Curator version ${global_curator_version}. This will be needed when upgrading to Curator 4.x (see note 3).

Notes

In upgrading to Curator 4.x, there is a breaking change that causes the MaasIntegrationTest to fail. The fix for that plus the actual upgrade to Curator 4.x will be performed under a separate Jira to aid the review process.
When upgrading to Curator 4.x, we will need to continue to run against Zookeeper 3.4.6. By default Curator 4.x will pull in Zookeeper 3.5.x.
To run with Curator 4.x and Zookeeper 3.4.6, we need to use the 2.12.0 version of curator-test based on this information provided by the Curator community.

Acceptance Testing

This PR should be tested using the centos7 development environment.

Start up the centos7 dev environment.

cd metron-deployment/development/centos7
vagrant destroy -f
vagrant up

Basics

Ensure that we can continue to parse, enrich, and index telemetry. Verify data is flowing through the system, from parsing to indexing

Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary
Open the Alerts UI. Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear)
Go to the Alerts UI and ensure that an ever increasing number of telemetry from Bro, Snort, and YAF are visible by watching the total alert count increase over time.
Ensure that geoip enrichment is occurring. The telemetry should contain fields like enrichments:geo:ip_src_addr:location_point.
Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary
Open the Kibana dashboard via the "Metron UI" option in the quick links
Verify the dashboard is populating

Streaming Enrichments

Create a Streaming Enrichment by following these instructions.

Launch the Stellar REPL.

source /etc/default/metron
cd $METRON_HOME
$METRON_HOME/bin/stellar -z $ZOOKEEPER

Define the streaming enrichment and save it as a new source of telemetry.

[Stellar]>>> conf := SHELL_EDIT(conf)
{
  "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
  "writerClassName": "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter",
  "sensorTopic": "user",
  "parserConfig": {
    "shew.table": "enrichment",
    "shew.cf": "t",
    "shew.keyColumns": "ip",
    "shew.enrichmentType": "user",
    "columns": {
      "user": 0,
      "ip": 1
    }
  }
}
[Stellar]>>>
[Stellar]>>> CONFIG_PUT("PARSER", conf, "user")

Go to the Management UI and start the new parser called 'user'.

Create some test telemetry.

[Stellar]>>> msgs := ["user1,192.168.1.1", "user2,192.168.1.2", "user3,192.168.1.3"]
[user1,192.168.1.1, user2,192.168.1.2, user3,192.168.1.3]
[Stellar]>>> KAFKA_PUT("user", msgs)
3
[Stellar]>>> KAFKA_PUT("user", msgs)
3
[Stellar]>>> KAFKA_PUT("user", msgs)
3

Ensure that the enrichments are persisted in HBase.

[Stellar]>>> ENRICHMENT_GET('user', '192.168.1.1', 'enrichment', 't')
{original_string=user1,192.168.1.1, guid=a6caf3c1-2506-4eb7-b33e-7c05b77cd72c, user=user1, timestamp=1551813589399, source.type=user}

[Stellar]>>> ENRICHMENT_GET('user', '192.168.1.2', 'enrichment', 't')
{original_string=user2,192.168.1.2, guid=49e4b8fa-c797-44f0-b041-cfb47983d54a, user=user2, timestamp=1551813589399, source.type=user}

[Stellar]>>> ENRICHMENT_GET('user', '192.168.1.3', 'enrichment', 't')
{original_string=user3,192.168.1.3, guid=324149fd-6c4c-42a3-b579-e218c032ea7f, user=user3, timestamp=1551813589402, source.type=user}

Enrichment Coprocessor

Confirm that the 'user' enrichment added in the previous section was 'found' by the coprocessor.
- Go to Swagger.
- Click the sensor-enrichment-config-controller option.
- Click the GET /api/v1/sensor/enrichment/config/list/available/enrichments option.
Click the "Try it out!" button. You should see a array returned with the value of each enrichment type that you have loaded.
[ "user" ]

Enrichment Stellar Functions in Storm

Follow instructions similar to these to load
the user data.
Create a simple file called user.csv.
jdoe,192.168.138.2

Create a file called user-extractor.json.

{
  "config": {
    "columns": {
      "user": 0,
      "ip": 1
    },
    "indicator_column": "ip",
    "separator": ",",
    "type": "user"
  },
  "extractor": "CSV"
}

Import the data.

source /etc/default/metron
$METRON_HOME/bin/flatfile_loader.sh -i ./user.csv -t enrichment -c t -e ./user-extractor.json

Validate that the enrichment loaded successfully.

[root@node1 0.7.2]# source /etc/default/metron
[root@node1 0.7.2]# $METRON_HOME/bin/stellar -z $ZOOKEEPER

[Stellar]>>> ip_dst_addr := "192.168.138.2"
192.168.138.2

[Stellar]>>> ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')
{ip=192.168.138.2, user=jdoe}

Use the User data to enrich the telemetry. Run the following commands in the REPL.

[Stellar]>>> bro := SHELL_EDIT()
{
 "enrichment" : {
   "fieldMap": {
     "stellar" : {
       "config" : {
         "users" : "ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')"
       }
     }
   }
 },
 "threatIntel": {
   "fieldMap": {},
   "fieldToTypeMap": {}
 }
}
[Stellar]>>> CONFIG_PUT("ENRICHMENT", bro, "bro")

Wait for the new configuration to be picked up by the running topology.

Review the Bro telemetry indexed into Elasticsearch. Look for records where the ip_dst_addr is 192.168.138.2. Ensure that some of the messages have the following fields created from the enrichment.

users:user
users:ip

{
  "_index": "bro_index_2019.08.13.20",
  "_type": "bro_doc",
  "_id": "AWyMxSJFg1bv3MpSt284",
  ...
  "_source": {          
    "ip_dst_addr": "192.168.138.2",
    "ip_src_addr": "192.168.138.158",
    "timestamp": 1565729823979,
    "source:type": "bro",
    "guid": "6778beb4-569d-478f-b1c9-8faaf475ac2f"
    ...
    "users:user": "jdoe",
    "users:ip": "192.168.138.2",
    ...
  },
  ...
}

Legacy HBase Adapter

We are going to perform the same enrichment, but instead using the legacy HBase Adapter.

Use the User data to enrich the telemetry. Run the following commands in the REPL.

[Stellar]>>> yaf := SHELL_EDIT()
{
  "enrichment" : {
    "fieldMap" : {
      "hbaseEnrichment" : [ "ip_dst_addr" ]
    },
    "fieldToTypeMap" : {
       "ip_dst_addr" : [ "user" ]
    },
    "config" : {
      "typeToColumnFamily" : {
        "user" : "t"
      }
    }
  },
  "threatIntel" : { },
  "configuration" : { }
}
[Stellar]>>> CONFIG_PUT("ENRICHMENT", yaf, "yaf")

Wait for the new configuration to be picked up by the running topology.

Review the YAF telemetry indexed into Elasticsearch. Look for records where the ip_dst_addr is 192.168.138.2. Ensure that some of the messages have the following fields created from the enrichment.

enrichments:hbaseEnrichment:ip_dst_addr:user:ip
enrichments:hbaseEnrichment:ip_dst_addr:user:user

{
  "_index": "yaf_index_2019.08.15.03",
  "_type": "yaf_doc",
  "_id": "AWyTZAwEIFY9jxc2THLF",
  "_version": 1,
  "_score": null,
  "_source": {
    "source:type": "yaf",
    "ip_dst_addr": "192.168.138.2",
    "ip_src_addr": "192.168.138.158",
    "guid": "6c73c09d-f099-4646-b653-762adce121fe",
    ...
    "enrichments:hbaseEnrichment:ip_dst_addr:user:ip": "192.168.138.2",
    "enrichments:hbaseEnrichment:ip_dst_addr:user:user": "jdoe",
  }
}

Profiler

Profiler in the REPL

Test a profile in the REPL according to these instructions.

[Stellar]>>> values := PROFILER_FLUSH(profiler)
[{period={duration=900000, period=1723089, start=1550780100000, end=1550781000000}, profile=hello-world, groups=[], value=4, entity=192.168.138.158}]

Streaming Profiler

Deploy that profile to the Streaming Profiler in Storm.
```
[Stellar]>>> CONFIG_PUT("PROFILER", conf)
```
Wait for the Streaming Profiler in Storm to flush and retrieve the measurement from HBase.

For the impatient, you can reset the period duration to 1 minute. Alternatively, you can allow the Profiler topology to work for a minute or two and then kill the profiler topology which will force it to flush a profile measurement to HBase.

Retrieve the measurement from HBase. Prior to this PR, it was not possible to query HBase from the REPL.
```
[Stellar]>>> PROFILE_GET("hello-world","192.168.138.158",PROFILE_FIXED(30,"DAYS"))
[2979]
```

Pull Request Checklist

For all changes:

Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
```
mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
```
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:
```
cd site-book
mvn site
```
Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

nickwallen · 2019-09-19T16:09:17Z

metron-platform/metron-pcap-backend/pom.xml

        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-client</artifactId>
-            <version>2.10.0</version>


For some reason, we were pulling in curator-client 2.10.0 instead of 2.7.1. I don't see this causing a problem, but I want to call it out for reviewers.

mmiklavc · 2019-09-19T16:10:29Z

My initial reaction to keeping our existing versions would be that we'd just have a couple pom dep additions here and there, but it looks like quite a lot of excludes across a lot of modules. Is that all related to us having previously leveraged transitive deps for curator and zookeeper?

nickwallen · 2019-09-19T16:11:01Z

Is that all related to us having previously leveraged transitive deps for curator and zookeeper?

Yes. For example, anywhere we pull in hadoop-common (which is a lot of places) we have to exclude all the curator-* dependencies that hadoop-common wants to pull-in, so we can ensure we pull-in the Curator version that we expect.

mmiklavc · 2019-09-19T17:59:20Z

metron-analytics/metron-maas-service/pom.xml

@@ -59,6 +59,23 @@
      <artifactId>hadoop-yarn-server-common</artifactId>
      <version>${hadoop.version}</version>
      <scope>provided</scope>
+      <exclusions>


One thing we might consider with these bigger dep exclusions (and this is just an idea, entirely up to you) is using wildcards.

<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.1</version> <exclusions> <exclusion> <groupId>org.apache.curator</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency>

I found this results in all curator deps from hadoop-common being excluded whereas previously I had the following:

grep curator /tmp/deptree.txt [INFO] | \- org.apache.curator:curator-framework:jar:2.7.1:compile [INFO] +- org.apache.curator:curator-client:jar:2.7.1:compile [INFO] +- org.apache.curator:curator-recipes:jar:2.7.1:compile

I'm not sure if this is too big a net to cast or not. The dependency plugin doesn't appear to do globs, only full wildcard matching at the artifactId or groupId level. e.g. "curator*" will not work in the artifactId, unfortunately. Just an idea.

This is the Maven feature - https://issues.apache.org/jira/browse/MNG-2315

That's a nice feature. I'd prefer to use it.

Unfortunately, I tried it like your example and I don't seem to work as I'd expect. When I use a wildcard, I still get old 2.12.0 dependencies showing up.

mmiklavc · 2019-09-19T18:41:08Z

I spun up full dev and telemetry is flowing through as expected. I'm seeing geo enrichments working as well.

mmiklavc · 2019-09-19T21:05:03Z

Looked over the recent run of commits. Based on my initial tests, this looks pretty good. When you're confident in the manual testing I'm +1 on this.

nickwallen · 2019-09-19T21:06:04Z

Thanks. I'm just going to double check the integration tests and run up dev environment after the latest commits (before merging).

nickwallen · 2019-09-20T14:15:53Z

All smoke tests passed. I updated the PR description with the tests I went through.

nickwallen · 2019-09-20T15:06:27Z

This has been merged into the feature branch.

METRON-2261 Isolate Curator Dependencies

afb0d04

nickwallen commented Sep 19, 2019

View reviewed changes

mmiklavc reviewed Sep 19, 2019

View reviewed changes

merrimanr mentioned this pull request Sep 19, 2019

METRON-2262: Upgrade to Curator 4.2.0 #1516

Open

11 tasks

nickwallen added 7 commits September 19, 2019 16:19

to avoid pulling in curator 2.8.0 with solr 7.4.0

c7dbac6

avoid pulling in curator 2.12.0 with storm 1.2.x

c0912cd

in maas-service to avoid pulling in curator 2.12.0 with hadoop 3.1.x

a167f4a

Removed duplicate zookeeper dependency from stellar-common

10594af

Enhanced comments for clarity

adb4db4

No need to exclude curator from metron-integration-test any longer

c8ca5c2

Comment changes only

6fb6921

asfgit pushed a commit that referenced this pull request Sep 20, 2019

METRON-2261 Isolate Curator Dependencies (nickwallen) closes #1515

f300ce3

nickwallen closed this Sep 20, 2019

nickwallen mentioned this pull request Sep 24, 2019

METRON-2265: Update Kerberos settings #1519

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

METRON-2261 Isolate Curator Dependencies #1515

METRON-2261 Isolate Curator Dependencies #1515

nickwallen commented Sep 19, 2019 •

edited

Loading

nickwallen Sep 19, 2019

mmiklavc commented Sep 19, 2019

nickwallen commented Sep 19, 2019 •

edited

Loading

mmiklavc Sep 19, 2019

nickwallen Sep 19, 2019 •

edited

Loading

mmiklavc commented Sep 19, 2019

mmiklavc commented Sep 19, 2019

nickwallen commented Sep 19, 2019 •

edited

Loading

nickwallen commented Sep 20, 2019

nickwallen commented Sep 20, 2019

METRON-2261 Isolate Curator Dependencies #1515

METRON-2261 Isolate Curator Dependencies #1515

Conversation

nickwallen commented Sep 19, 2019 • edited Loading

Changes

Notes

Acceptance Testing

Basics

Streaming Enrichments

Enrichment Coprocessor

Enrichment Stellar Functions in Storm

Legacy HBase Adapter

Profiler

Profiler in the REPL

Streaming Profiler

Pull Request Checklist

For all changes:

For code changes:

For documentation related changes:

Note:

nickwallen Sep 19, 2019

Choose a reason for hiding this comment

mmiklavc commented Sep 19, 2019

nickwallen commented Sep 19, 2019 • edited Loading

mmiklavc Sep 19, 2019

Choose a reason for hiding this comment

nickwallen Sep 19, 2019 • edited Loading

Choose a reason for hiding this comment

mmiklavc commented Sep 19, 2019

mmiklavc commented Sep 19, 2019

nickwallen commented Sep 19, 2019 • edited Loading

nickwallen commented Sep 20, 2019

nickwallen commented Sep 20, 2019

nickwallen commented Sep 19, 2019 •

edited

Loading

nickwallen commented Sep 19, 2019 •

edited

Loading

nickwallen Sep 19, 2019 •

edited

Loading

nickwallen commented Sep 19, 2019 •

edited

Loading