In this project, you can create a simple pseudo-distributed hadoop, hive and spark computing environment.
- Tarball (Put under tar folder)
- hadoop-2.7.1.tar
- apache-hive-1.2.1-bin.tar.gz
- spark-1.5.1-bin-hadoop2.6.tar
- RPM (Put under rpm folder)
- jdk-7u79-linux-x64.rpm
- Use CentOS 6.7 minimal ISO to install OS on Virtual Box.
- Make sure that the network is connectable to internet with your VM.
- You have to download all needed tarballs and rpm before installation.
- Put project folder under /opt.
- $ 1_config_install.sh start
- $ 2_hadoop_install.sh start
- $ 3_hive_install.sh start
- $ 4_spark_install.sh start
After all these steps, you can use a Pseudo-distributed Hadoop environment with Hive(remote mode) and Spark.