layout | title | permalink |
---|---|---|
doc |
Cloudera Integration |
/docs/cloudera-integration.html |
Since Eagle 0.4.0
Configuring Apache Eagle on Cloudera is very similar to configuring it on Hortonworks, but still there are some difference. This tutorial is to address these issues before you continue to follow the tutorials originally prepared for Hortonworks.
To get Apache Eagle works on Cloudera, we need:
- Zookeeper (installed through Cloudera Manager)
- Kafka (installed through Cloudera Manager)
- Storm (
0.9.x
or0.10.x
, installed manually) - Logstash (
2.X
, installed manually on NameNode)
There are two configurations needed to be mentioned:
-
Open Cloudera Manager and open "kafka" configuration, then set
“zookeeper Root”
to“/”
. -
If Kafka cannot be started successfully, check kafka’s log. If stack trace shows:
“java.lang.OutOfMemoryError: Java heap space”
. Increase heap size by setting"KAFKA_HEAP_OPTS"
in/bin/kafka-server-start.sh
.
Example:
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"
- Step1: create a kafka topic (here I created a topic called “test”, which will be used in logstash configuration file to receive hdfsAudit log messages from Cloudera.
bin/kafka-topics.sh --create --zookeeper 127.0.0.1:2181 --replication-factor 1 --partitions 1 --topic test
- Step2: check if topic has been created successfully.
bin/kafka-topics.sh --list --zookeeper 127.0.0.1:2181
this command will show all created topics.
- Step3: open two terminals, start “producer” and “consumer” separately.
/usr/bin/kafka-console-producer --broker-list hostname:9092 --topic test
/usr/bin/kafka-console-consumer --zookeeper hostname:2181 --topic test
- Step4: type in some message in producer. If consumer can receive the messages sent from producer, then kafka is working fine. Otherwise please check the configuration and logs to identify the root cause of issues.
You can follow logstash online doc to download and install logstash on your machine:
Or you can install it through yum
if you are using centos:
- download and install the public signing key:
rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
- Add the following lines in
/etc/yum.repos.d/
directory in a file with a.repo
suffix, for examplelogstash.repo
.
[logstash-2.3]
name=Logstash repository for 2.3.x packages
baseurl=https://packages.elastic.co/logstash/2.3/centos
gpgcheck=1
gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1
- Then install it using
yum
:
yum install logstash
Follow Apache Eagle online documentation to create logstash configuration file for Apache Eagle.
bin/logstash -f conf/first-pipeline.conf
Open a terminal and start a kafka consumer to see if it can receive the messages sent by logstash, if there is no message, double check the configuration parameters in conf file. Otherwise logstash is all set.
As Apache Storm is not in Cloudera’s stack, we need to install Storm manually.
Download Apache Storm from here, the version you choose should be 0.10.x
or 0.9.x
release.
Then follow Apache Storm online doc) to install Apache Storm on your cluster.
In /etc/profile
, add this:
export PATH=$PATH:/opt/apache-storm-0.10.1/bin/
save the profile and then type:
source /etc/profile
to make it work.
In storm/conf/storm.yaml
, change the hostname to your own host.
In Termial, type:
$: storm nimbus
$: storm supervisor
$: storm UI
Open storm UI in your browser, default URL is : http://hostname:8080/index.html
.
To download and install Apache Eagle, please refer to Get Started with Sandbox. .
One thing need to mention is: in “/bin/eagle-topology.sh”
, line 102:
storm_ui=http://localhost:8080
If you are not using the default port number, change this to your own Storm UI url.
I know it takes time to finish these configuration, but now it is time to have fun!
Just try HDFS Data Activity Monitoring
with Demo
listed in HDFS Data Activity Monitoring.