Cloudera Parcel Installation

This script helps you create a Cloudera parcel that includes GCS connector. The parcel can be deployed on a Cloudera managed cluster.

Prerequisites

Download create_parcel.sh script file.
[Optional] If you want the script to deploy the parcel file under Cloudera Manager's parcel repo directly, you need to use the script on the Cloudera Manager server.
[Optional] For on-premise clusters or clusters on other cloud providers, you need to get the service account JSON key from your GCP account. You can follow steps from this document.

Installation

Once you have the create_parcel.sh file, place it under user's home directory and run the script in below format.

Here is an example on how to run the script

$ chmod u+x create_parcel.sh
$ ./create_parcel.sh -f parcel_name -v version -o operating_system -d true

Where,
-f: name of the parcel in a single string format without any spaces or special characters.
-v: version of the parcel in the format x.x.x (ex: 1.0.0)
-o: name of the operating system distribution to be chosen from this list (rhel5, rhel6, rhel7, suse11, 
    ubuntu10, ubuntu12, ubuntu14, debian6, debian7)
-d: flag is to be used if you want to deploy the parcel to the Cloudera manager parcel repo folder, 
    this flag is optional and if not provided then the parcel file will be created in the same 
    directory where script run.

Example

We can name this parcel as “gcsconnector”, version as “1.0.0”, os type rhel6, and set the deployment on Cloudera manager as "true" and run the below command.

$ ./create_parcel.sh -f gcsconnector -v 1.0.0 -o el6 -d true

Deployment

Once the script runs successfully, you need to make sure that Cloudera Manager can find the new parcel, especially if you host the parcel file by yourself.

You can check the parcel from Cloudera Manager Home page, click the Hosts > Parcels > Check parcels. Once the new parcel populates in the list of parcels. Click Distribute > Activate parcel. This will distribute and activate the parcel on all Cloudera managed hosts.

Once activated successfully, restart all the stale services.

For script logs check : /home/$user/parcel-logs/. For cloudera parcel logs : /var/log/cloudera-scm-server/.

Configure CDH services to use GCS connector

HDFS service

From the Cloudera Manager console go to HDFS service > Configurations > core-site.xml

Add the following properties in the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml

google.cloud.auth.service.account.enable : true

[Optional] google.cloud.auth.service.account.json.keyfile : Full path to JSON key file downloaded for service account
Example : 
/opt/cloudera/parcels/gcsconnector/lib/hadoop/lib/key.json

fs.gs.project.id : GCP_project_ID

fs.gs.impl : com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem

Save configurations and restart required services.

Validate HDFS service

Export Java and Hadoop classpath pointing to the gcsconnector jar.

$ export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/
$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/GCSCONNECTOR/lib/hadoop/lib/gcs-connector-hadoop2-latest.jar

Run the 'hdfs dfs -ls' command to access GCS bucket:

hdfs dfs -ls gs://bucket_name

Spark service

From the Cloudera Manager console, go to Spark > Configurations > Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

Add below configuration according to the gcs connector jar path.

SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/opt/cloudera/parcels/GCSCONNECTOR/lib/hadoop/lib/gcs-connector-hadoop2-latest.jar

Validate Spark connection with GCS

Open Spark shell

$ spark-shell

Read file stored on Cloud Storage by providing the gs:// path of a JSON file stored under GCS bucket.

val src=sqlContext.read.json("gs://bucket-name/some_sample.json")

Hive service

From the Cloudera Manager console, go to Hive Service > Configuration > Hive Auxiliary JARs Directory. Set the value to /opt/cloudera/parcels/gcsconnector/lib/hadoop/lib/ (absolute directory to the GCS connector)

Validate Hive service

Validate if JAR is being accepted by opening beeline and connecting to HiveServer2:

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
README.md		README.md
Screen Shot 2018-10-04 at 9.56.59 PM.png		Screen Shot 2018-10-04 at 9.56.59 PM.png
Screen Shot 2018-10-05 at 10.21.03 PM.png		Screen Shot 2018-10-05 at 10.21.03 PM.png
Screen Shot 2018-10-05 at 12.04.51 AM.png		Screen Shot 2018-10-05 at 12.04.51 AM.png
Screen Shot 2018-10-05 at 5.38.44 PM.png		Screen Shot 2018-10-05 at 5.38.44 PM.png
Screen Shot 2018-10-05 at 8.29.55 PM.png		Screen Shot 2018-10-05 at 8.29.55 PM.png
Screen Shot 2018-12-07 at 3.17.12 PM.png		Screen Shot 2018-12-07 at 3.17.12 PM.png
create_parcel.sh		create_parcel.sh
hdfs-config.png		hdfs-config.png
parcelpage.png		parcelpage.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudera Parcel Installation

Prerequisites

Installation

Example

Deployment

Configure CDH services to use GCS connector

HDFS service

Validate HDFS service

Spark service

Validate Spark connection with GCS

Hive service

Validate Hive service

About

Releases

Packages

Languages

haaris292/cparcel

Folders and files

Latest commit

History

Repository files navigation

Cloudera Parcel Installation

Prerequisites

Installation

Example

Deployment

Configure CDH services to use GCS connector

HDFS service

Validate HDFS service

Spark service

Validate Spark connection with GCS

Hive service

Validate Hive service

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages