Skip to content

Latest commit

 

History

History
435 lines (325 loc) · 18.6 KB

README.md

File metadata and controls

435 lines (325 loc) · 18.6 KB

Developer Quickstart on HDP Sandbox using Ambari VNC service

An Ambari service package for VNC Server with the ability to install developer tools like Eclipse/IntelliJ/Maven as well to 'remote desktop' to the sandbox and quickly start developing on HDP Hadoop. Also includes the option to install the Spark 1.2.0 Tech Preview

Author: Ali Bajwa

Contents

Setup VNC service
  • Download HDP 2.3 sandbox VM image (Sandbox_HDP_2.3_VMWare.ova) from Hortonworks website
  • Import Sandbox_HDP_2.3_VMWare.ova into VMWare and set the VM memory size to 8GB
  • Now start the VM
  • After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.
192.168.191.241 sandbox.hortonworks.com sandbox    
  • Connect to the VM via SSH (password hadoop)
  • To deploy the VNC service, run below
VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
sudo git clone https://github.com/hortonworks-gallery/ambari-vnc-service.git   /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/VNCSERVER   
  • Restart Ambari
#on sandbox
sudo service ambari restart

#on non-sandbox
sudo service ambari-server restart
  • Then you can click on 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard:

On bottom left -> Actions -> Add service -> check VNC Server -> Next -> Next -> Enter password -> Next -> Deploy Image

  • Note that currently you cant change these configurations after installing the service (this is WIP)

  • To change the geometry you can edit this file /etc/sysconfig/vncservers

  • You can also remove the service using the steps below and re-install with correct settings

  • On successful deployment you will see the VNC service as part of Ambari service and will be able to start/stop the service from here: Image

  • When you've completed the install process, VNC server will be available at your VM's IP on display 1 with the password you setup.

  • One benefit to wrapping the component in Ambari service is that you can now monitor/manage this service remotely via REST API

export SERVICE=VNC
export PASSWORD=admin
export AMBARI_HOST=sandbox.hortonworks.com
export CLUSTER=Sandbox

#get service status
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE

#start service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE

#stop service
curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICE via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/services/$SERVICE

Connect to VNC server
Connect via VNC client
  • Option 1: install Chicken of the VNC client on your Mac and use it to connect. On windows you can also install Tight VNC or UltraVNC clients to do the same. Image

  • Note that:

    • For VirtualBox users, you will need to forward port 5901 to avoid connection refused errors.
    • You may need to stop your firewall as well:
    service iptables save
    service iptables stop
    chkconfig iptables off
    
    • On logging in you will see the CentOS desktop running on the sandbox Image
Connect via browser
  • Option 2: You can also configure using your browser as a VNC client via Java applet
    sudo vi "/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/lib/security/java.policy"
    #add permission line below the grant
    grant {
            permission java.security.AllPermission;
    
Connect via Ambari view

Getting started with Eclipse/IntelliJ

  • To start Eclipse, click the eclipse shortcut Image

  • To start IntelliJ, click the intellij shortcut Image

  • To remove the VNC service:

    • Stop the service via Ambari

    • Delete the service

      curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/VNC
      
    • Remove artifacts

      /var/lib/ambari-server/resources/stacks/HDP/2.2/services/vnc-stack/remove.sh
      

Getting started with Storm and Maven in Eclipse environment

mkdir /opt/TruckEvents  
cd /opt/TruckEvents  
wget https://www.dropbox.com/s/7gk1u3khrfaz3tu/Tutorials-master.zip  
unzip Tutorials-master.zip
  • Option 2: Download code for the Twitter IoT workshop topology if not done already
cd
git clone https://github.com/hortonworks-gallery/hdp22-twitter-demo.git 
/root/hdp22-twitter-demo/setup-scripts/restart_solr_banana.sh
  • Option 3: Download code for starter Twitter storm topology
cd 
git clone https://github.com/abajwa-hw/hdp22-hive-streaming.git 
cd /root/hdp22-hive-streaming
  • For option 3, you will need to complete the pre-requisites mentioned (i.e. install mvn, create Hive table etc) here.
#update your twitter keys in this file
vi src/test/HiveTopology.java

#install maven (if not already installed)
curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo
yum -y install apache-maven

#Create persons table in Mysql
mysql -u root -p
#empty password

create database people;
use people;
create table persons (people_id INT PRIMARY KEY, sex text, bdate DATE, firstname text, lastname text, addresslineone text, addresslinetwo text, city text, postalcode text, ssn text, id2 text, email text, id3 text);
LOAD DATA LOCAL INFILE '~/hdp22-hive-streaming/data/PII_data_small.csv' REPLACE INTO TABLE persons FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
exit;

#import persons table into Hive using Sqoop
sqoop import --verbose --connect 'jdbc:mysql://localhost/people' --table persons --username root --hcatalog-table persons --hcatalog-storage-stanza "stored as orc" -m 1 --create-hcatalog-table 

#create user_tweets table in Hive
hive -e 'create table if not exists user_tweets (twitterid string, userid int, displayname string, created string, language string, tweet string) clustered by (userid) into 7 buckets stored as orc tblproperties("orc.compress"="NONE","transactional"="true")'
sudo -u hdfs hadoop fs -chmod +w /apps/hive/warehouse/user_tweets
  • For all 3 options (option 1 , option 2, option 3): follow steps below for next steps...

  • Once you already have your storm code on the VM, just import the dir containing the pom.xml into Eclipse:

    • File
    • Import
    • Maven
    • Existing Maven Projects
    • Browse
    • navigate to your dir containing pom.xml:
      • For option 1: /opt/TruckEvents/Tutorials-master
      • For option 2: /root/hdp22-twitter-demo/stormtwitter-mvn
      • For option 3: /root/hdp22-hive-streaming
    • OK

This will start building the project and importing the maven jars which may run for a few minutes. You will see errors in the project because the correct java version was not picked up.

  • Check the java compiler is using 1.7.
    • select the project
    • File
    • Properties
    • Java Compiler
    • uncheck "use compliance from..."
    • set "Compiler compliance level" to 1.7
    • Yes
    • OK

Image

  • The eclipse project should build on its own and not show errors (if not, you may need to add jars to the project properties)

  • To run maven compile:

    • In Eclipse, click:

      • Run
      • Run Configurations
      • Maven Build
    • The first time you do this, it will ask you for the configuration:

      • Name: specify anything (e.g. streaming compile)
      • Base dir: base dir of source code:
        • option 1: /opt/TruckEvents/Tutorials-master
        • option 2: /root/hdp22-twitter-demo/stormtwitter-mvn
        • option 3: /root/hdp22-hive-streaming
      • Under ‘Goals’: clean install
      • Under Maven Runtime: (scroll down to see this option) add your existing mvn install on the sandbox (its faster than using the embedded one) Image
      • Configure > Add > click ‘Directory’ and navigate to the dir where it installed mvn (i.e. /usr/share/apache-maven)
      • So now your maven run configuration should look as below Image
      • Click Run to start compile
  • Eclipse should now be able to run a mvn compile and create the uber jar. In the future you can just select below to compile:

    • In Eclipse, click:
      • Run
      • Run History
      • streaming compile
  • Now to setup Eclipse to run the compiled topology lets create an external tools config:

    • In Eclipse, click

      • Run
      • External Tools
      • External Tools Configurations
      • Program
      • New
    • Then configure the external config based on which option you are using:

    • Option 1: For trucking demo tutorial

      • Name: Run storm locally
      • Location: /usr/bin/storm
      • Working Directory: /opt/TruckEvents/Tutorials-master
      • Arguments: target/Tutorial-1.0-SNAPSHOT.jar com.hortonworks.tutorials.tutorial3.TruckEventProcessingTopology
      • click Run
    • Option 2: For Twitter IoT workshop

      • Name: Run storm locally
      • Location: /usr/bin/storm
      • Working Directory: ${workspace_loc:/storm-streaming}
      • Arguments: jar target/storm-streaming-1.0-SNAPSHOT.jar hellostorm.GNstorm runLocally localhost
        • Note the above runs the topology locally. To run on the cluster instead: replace runLocally with runOnCluster
      • click Run Image
    • Option 3: starter Twitter topology

      • Name: Run starter Twitter topology
      • Location: /usr/bin/storm
      • Working Directory: /root/hdp22-hive-streaming
      • Arguments: storm jar ./target/storm-integration-test-1.0-SNAPSHOT.jar test.HiveTopology thrift://sandbox.hortonworks.com:9083 default user_tweets twitter_topology
      • click Run
  • This should run your topology. In the future you can just select below to run the topology:

    • In Eclipse, click:
      • Run
      • External Tools
      • Run storm topology locally
  • You can also run your topology from command line, for example:

    • For option 1: for trucking demo tutorial:
cd /opt/TruckEvents/Tutorials-master/
storm jar target/Tutorial-1.0-SNAPSHOT.jar com.hortonworks.tutorials.tutorial2.TruckEventProcessingTopology
storm jar target/Tutorial-1.0-SNAPSHOT.jar com.hortonworks.tutorials.tutorial3.TruckEventProcessingTopology
  • For option 2: For Twitter IoT workshop
cd /root/hdp22-twitter-demo/stormtwitter-mvn

#to run locally
storm jar ./target/storm-streaming-1.0-SNAPSHOT.jar hellostorm.GNstorm runLocally localhost

#to run on cluster instead
storm jar ./target/storm-streaming-1.0-SNAPSHOT.jar hellostorm.GNstorm runOnCluster localhost
  • For option 3: starter Twitter topology
cd /root/hdp22-hive-streaming

#sumbit topology
storm jar ./target/storm-integration-test-1.0-SNAPSHOT.jar test.HiveTopology thrift://sandbox.hortonworks.com:9083 default user_tweets twitter_topology

#check user_tweets hive table
hive -e 'select * from user_tweets'

#stop topology
storm kill twitter_topology
  • You have successfully imported a Storm maven project into Eclipse and setup the ability to compile/run from Eclipse

Getting started with Spark on HDP

Spark now comes installed on HDP sandbox. You can get started using the tutorials provided:


Getting started with Nifi on HDP

Image

Image


Getting started with Zeppelin on HDP

Image


Getting started with iPython Notebook on HDP

  • Install iPython notebook service using instructions here.

Image

  • Setup the airline demo in iPython using steps below:

  • Make few changes to sandbox VM before setting up airline demo. Instructions to do these tasks are available on the same airline demo page above.

    • Make sure the sandbox VM is started with large amount of memory (15 GB) and disk to 65GB in order to run.
    • Also change Ambari setting to run using Tez.
  • Download airline delay and weather data and copy into HDFS

export HOME_DIR=/home/ipython
export PROJECT_DIR=/tmp/HDP_DS_setup

sudo -u hdfs hadoop fs -mkdir /user/ipython
sudo -u hdfs hadoop fs -chown ipython:ipython /user/ipython
hadoop fs -mkdir /user/ipython/airline
hadoop fs -mkdir /user/ipython/airline/delay
hadoop fs -mkdir /user/ipython/airline/weather


mkdir $PROJECT_DIR
cd $PROJECT_DIR

wget http://stat-computing.org/dataexpo/2009/2007.csv.bz2
bzip2 -d 2007.csv.bz2
wget http://stat-computing.org/dataexpo/2009/2008.csv.bz2
bzip2 -d 2008.csv.bz2
hadoop fs -put *.csv /user/ipython/airline/delay
#delete copy of data from local FS to save space
rm $PROJECT_DIR/*.csv


wget ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2007.csv.gz
gunzip -d 2007.csv.gz
wget ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2008.csv.gz
gunzip -d 2008.csv.gz
hadoop fs -put *.csv /user/demo/airline/weather
#delete copy of data from local FS to save space
rm $PROJECT_DIR/*.csv

  • download the the python version of the airline demo notebook
cd /home/ipython/notebooks
wget https://github.com/abajwa-hw/hdp-datascience-demo/blob/master/demo-HDP2.2/airline_python.ipynb