Skip to content

Build prerequisites

Alex Leblang edited this page Apr 13, 2015 · 19 revisions

These instructions are installing the preqrequisite packages and configuration for Impala. Currently we have guides for building on Ubuntu 14.04 and CentOs 6.5.

Java

Download the Oracle Java 7 JDK.

On Ubuntu 14.04 this can be done with the following commands:

sudo add-apt-repository ppa:webupd8team/java -y
sudo apt-get update -y

# Will have to agree to License
sudo apt-get install oracle-jdk7-installer -y

On CentOs 6.5, this page has a good guide.


Required packages

On Ubuntu 14.04

sudo apt-get install git build-essential cmake bison flex pkg-config libsasl2-dev autoconf automake libtool maven subversion doxygen libbz2-dev zlib1g-dev  python-setuptools python-dev libssl-dev libboost-all-dev postgresql -y

On CentOs 6.5

sudo yum groupinstall "Development Tools"
sudo yum -y install git ant boost-test boost-program-options libevent-devel automake libtool flex bison gcc-c++ openssl-devel make cmake doxygen.x86_64 glib-devel boost-devel python-devel bzip2-devel svn libevent-devel krb5-workstation openldap-devel db4-devel python-setuptools python-pip cyrus-sasl* postgresql postgresql-server ant-nodeps lzo-devel lzop

Configuring Postgresql

If you are installing Impala on a fresh machine, you'll need to initialize postgres. On CentOs 6.5 this can be done by running

sudo service postgresql initdb

You need to make a configurations change to allow Hbase and the Hive metastore to functions correctly. Edit the following file as root.

On Ubuntu 14.04

/etc/postgresql/*/main/pg_hba.conf

On CentOs 6.5

/var/lib/pgsql/data/pg_hba.conf

In the following lines at the end of the file, change peer or ident to trust.

# Database administrative login by UNIX sockets
local   all         all                          ident

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               ident
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5
# IPv6 local connections:
host    all         all         ::1/128               md5
Creating the Hive metastore user
sudo -u postgres psql postgres

Then, at the postgres command prompt:

CREATE ROLE hiveuser LOGIN PASSWORD 'password';
ALTER ROLE hiveuser WITH CREATEDB;

LLVM

wget http://llvm.org/releases/3.3/llvm-3.3.src.tar.gz
tar xvf llvm-3.3.src.tar.gz
cd llvm-3.3.src/tools/
svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_33/final/ clang
cd ../projects/
svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_33/final
cd ..
./configure --with-pic
make -j4 REQUIRES_RTTI=1
sudo make install

Maven 3

wget http://www.interior-dsgn.com/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
tar xvf apache-maven-3.0.5-bin.tar.gz && sudo mv apache-maven-3.0.5 /usr/local

Environment variables

Put these in your .bashrc or elsewhere:

On Ubuntu 14.04

export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export IMPALA_HOME=<path to Impala>
export BOOST_LIBRARYDIR=/usr/lib/x86_64-linux-gnu
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
export LC_ALL="en_US.UTF-8"
export M2_HOME=/usr/local/apache-maven-3.0.5
export M2=$M2_HOME/bin  
export PATH=$M2:$PATH

On CentOs 6.5

export BOOST_ROOT=/usr/local

Add a path for HDFS domain sockets

sudo mkdir /var/lib/hadoop-hdfs/
sudo chown ubuntu hadoop-hdfs/

Enable password-less SSH for HBase

ssh-keygen -t dsa
# Do not type in any passkey. Just press enter.
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys