Create a box which includes
- OpenJDK 1.7
- Apache Hadoop 1.2.1
- Apache Hive 0.11.0
- Mysql 5.5
- Apache Sqoop 1.4.4
Required:
- Installed latest Vagrant
- Installed latest Virtual box
Clone this project to local machine and run
vagrant up
This would up the ubuntu server box and install some packages. Once running is done, you should see the "Good luck!" message, so everything shoud be ready. NOTE: If you running on Windows, you might meet the error "/bin/sh^M: bad interpreter: No such file or directory". for fixing:
sudo perl -i -pe 'y|\r||d' install.sh
and run "vagrang up" again
-
SSH to VM
vagrant ssh
-
Create hdfs folder
mkdir /home/vagrant/hdfs chmod -R 777 /home/vagrant/hdfs/
-
Format Hadoop HDFS
hadoop namenode -format -force
-
For starting hadoop without asking the password
ssh-keygen -t dsa -P '' -f /home/vagrant/.ssh/id_dsa cat /home/vagrant/.ssh/id_dsa.pub >> /home/vagrant/.ssh/authorized_keys
-
Starting Hadoop
start-all.sh
-
Starting Hive
hive
-
For testing, create a sample table
hive>CREATE TABLE pokes (foo INT, bar STRING);
-
Use Sqoop
sqoop help
-
Use MySql
mysql -uroot -proot
Access DFS on browser http://dev-hadoop:50070/dfshealth.jsp
Access job tracker on browser http://dev-hadoop:50030/jobtracker.jsp
-
Move to /vagrant
cd /vagrant
-
Create mysql database
mysql -uroot -proot create database test_db
-
Import sample database to mysql
mysql test_db -uroot -proot < hedgefund-data.sql
-
Create a Hive database called "test_db"
hive> create database test_db;
-
Import mysql table to Hive
sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test_db --table filings --username root --password root --hive-import --warehouse-dir /user/hive/warehouse/test_db.db --fields-terminated-by ',' --split-by id --hive-database test_db --hive-table filings
- filings a sample table.
- dev-hadoop is forwarding to 192.168.33.15 automatic, this is private ip of Virtual machine, you can change in Vagrantfile
- When vagrant up again, you must start hadoop manualy too.
Feel free to give me some comment over email [email protected]. Thanks~