-
Notifications
You must be signed in to change notification settings - Fork 508
METRON-2261 Isolate Curator Dependencies #1515
METRON-2261 Isolate Curator Dependencies #1515
Conversation
<dependency> | ||
<groupId>org.apache.curator</groupId> | ||
<artifactId>curator-client</artifactId> | ||
<version>2.10.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, we were pulling in curator-client 2.10.0 instead of 2.7.1. I don't see this causing a problem, but I want to call it out for reviewers.
My initial reaction to keeping our existing versions would be that we'd just have a couple pom dep additions here and there, but it looks like quite a lot of excludes across a lot of modules. Is that all related to us having previously leveraged transitive deps for curator and zookeeper? |
Yes. For example, anywhere we pull in hadoop-common (which is a lot of places) we have to exclude all the curator-* dependencies that hadoop-common wants to pull-in, so we can ensure we pull-in the Curator version that we expect. |
@@ -59,6 +59,23 @@ | |||
<artifactId>hadoop-yarn-server-common</artifactId> | |||
<version>${hadoop.version}</version> | |||
<scope>provided</scope> | |||
<exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we might consider with these bigger dep exclusions (and this is just an idea, entirely up to you) is using wildcards.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
I found this results in all curator deps from hadoop-common being excluded whereas previously I had the following:
grep curator /tmp/deptree.txt
[INFO] | \- org.apache.curator:curator-framework:jar:2.7.1:compile
[INFO] +- org.apache.curator:curator-client:jar:2.7.1:compile
[INFO] +- org.apache.curator:curator-recipes:jar:2.7.1:compile
I'm not sure if this is too big a net to cast or not. The dependency plugin doesn't appear to do globs, only full wildcard matching at the artifactId or groupId level. e.g. "curator*" will not work in the artifactId, unfortunately. Just an idea.
This is the Maven feature - https://issues.apache.org/jira/browse/MNG-2315
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice feature. I'd prefer to use it.
Unfortunately, I tried it like your example and I don't seem to work as I'd expect. When I use a wildcard, I still get old 2.12.0 dependencies showing up.
Looked over the recent run of commits. Based on my initial tests, this looks pretty good. When you're confident in the manual testing I'm +1 on this. |
Thanks. I'm just going to double check the integration tests and run up dev environment after the latest commits (before merging). |
All smoke tests passed. I updated the PR description with the tests I went through. |
This has been merged into the feature branch. |
As part of the HDP 3.1 upgrade, we need to upgrade to Curator 4.x. There was a discuss thread covering the need for this.
Currently, Curator is being pulled in as a transitive of Hadoop, HBase and Storm. At the current version levels of these dependencies this does not cause a problem. But when upgrading Hadoop, different conflicting versions of Curator will get pulled in. We need to ensure that only a single version of Curator is pulled in after upgrading Hadoop.
This change is preparing us for follow-on PRs including...
Changes
This change will maintain the current versions in use including Curator 2.7.1 and Zookeeper 3.4.6 (see note 1).
Defines Zookeeper as a direct dependency instead of relying on it being pulled in as a transitive of Curator (see note 2).
Defines Curator as a direct dependency instead of relying on it being pulled in as a transitive.
Introduces a separate
${global_curator_test_version}
to allow the Curator test dependencies to differ from the main Curator version${global_curator_version}
. This will be needed when upgrading to Curator 4.x (see note 3).Notes
In upgrading to Curator 4.x, there is a breaking change that causes the MaasIntegrationTest to fail. The fix for that plus the actual upgrade to Curator 4.x will be performed under a separate Jira to aid the review process.
When upgrading to Curator 4.x, we will need to continue to run against Zookeeper 3.4.6. By default Curator 4.x will pull in Zookeeper 3.5.x.
To run with Curator 4.x and Zookeeper 3.4.6, we need to use the 2.12.0 version of curator-test based on this information provided by the Curator community.
Acceptance Testing
This PR should be tested using the centos7 development environment.
Basics
Ensure that we can continue to parse, enrich, and index telemetry. Verify data is flowing through the system, from parsing to indexing
Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary
Open the Alerts UI. Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear)
Go to the Alerts UI and ensure that an ever increasing number of telemetry from Bro, Snort, and YAF are visible by watching the total alert count increase over time.
Ensure that geoip enrichment is occurring. The telemetry should contain fields like
enrichments:geo:ip_src_addr:location_point
.Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary
Open the Kibana dashboard via the "Metron UI" option in the quick links
Verify the dashboard is populating
Streaming Enrichments
Create a Streaming Enrichment by following these instructions.
Launch the Stellar REPL.
Define the streaming enrichment and save it as a new source of telemetry.
Go to the Management UI and start the new parser called 'user'.
Create some test telemetry.
Ensure that the enrichments are persisted in HBase.
Enrichment Coprocessor
Confirm that the 'user' enrichment added in the previous section was 'found' by the coprocessor.
sensor-enrichment-config-controller
option.GET /api/v1/sensor/enrichment/config/list/available/enrichments
option.Click the "Try it out!" button. You should see a array returned with the value of each enrichment type that you have loaded.
[ "user" ]
Enrichment Stellar Functions in Storm
Follow instructions similar to these to load
the user data.
Create a simple file called
user.csv
.jdoe,192.168.138.2
Create a file called
user-extractor.json
.Import the data.
Validate that the enrichment loaded successfully.
Use the User data to enrich the telemetry. Run the following commands in the REPL.
Wait for the new configuration to be picked up by the running topology.
Review the Bro telemetry indexed into Elasticsearch. Look for records where the
ip_dst_addr
is192.168.138.2
. Ensure that some of the messages have the following fields created from the enrichment.users:user
users:ip
Legacy HBase Adapter
We are going to perform the same enrichment, but instead using the legacy HBase Adapter.
Use the User data to enrich the telemetry. Run the following commands in the REPL.
Wait for the new configuration to be picked up by the running topology.
Review the YAF telemetry indexed into Elasticsearch. Look for records where the
ip_dst_addr
is192.168.138.2
. Ensure that some of the messages have the following fields created from the enrichment.enrichments:hbaseEnrichment:ip_dst_addr:user:ip
enrichments:hbaseEnrichment:ip_dst_addr:user:user
Profiler
Profiler in the REPL
Test a profile in the REPL according to these instructions.
Streaming Profiler
Deploy that profile to the Streaming Profiler in Storm.
Wait for the Streaming Profiler in Storm to flush and retrieve the measurement from HBase.
For the impatient, you can reset the period duration to 1 minute. Alternatively, you can allow the Profiler topology to work for a minute or two and then kill the
profiler
topology which will force it to flush a profile measurement to HBase.Retrieve the measurement from HBase. Prior to this PR, it was not possible to query HBase from the REPL.
Pull Request Checklist
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html
:Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.