hbase-ec2 is a Ruby library to help manage a set of Amazon EC2 instances as a single HBase cluster.
hbase-ec2 is currently supplied as a set of ruby files that currently is :
- lib/hcluster.rb : the Hadoop::HCluster and Hadoop::HImage class definitions
- lib/TestDFSIO.rb : a subclass of HCluster, intended as an example for testing Hadoop filesystem functionality.
export RUBYOPT="rubygems"
- AWS::EC2. You can install this with
gem install amazon-ec2
. - AWS::S3. You can install this with
gem install aws-s3
. - Net::SSH. You can install this with
gem install net-ssh
. - Net::SCP. You can install this with
gem install net-scp
. - OpenSSL support for Ruby. This might also be installed with your ruby, but on Ubuntu, I had to do:
apt-get install libruby-extras
. - An Amazon EC2 account. You must add the following to your environment prior to starting irb:
export AMAZON_ACCESS_KEY_ID=... export AMAZON_SECRET_ACCESS_KEY=... export AWS_ACCOUNT_ID=...
- A EC2 key pair called “root”. This should be stored in your home directory in
~/.ec2/root.pem
.
git clone git://github.com/ekoontz/hbase-ec2.git
$ irb >> $:.unshift("~/hbase-ec2/lib") => ["~/hbase-ec2/lib", ...] >> load 'hcluster.rb' => true >> include Hadoop => Object
1. run ant tar
(0.20-era):
ekoontz@localhost:~/hbase$ git branch -a * tags/0.20.5 .. ekoontz@localhost:~/hbase$ ant clean tar .. tar: [tar] Building tar: /home/ekoontz/hbase/build/hbase-0.20.5.tar.gz BUILD SUCCESSFUL Total time: 48 seconds ekoontz@localhost:~/hbase$or
mvn -DskipTests assembly:assembly
(post 0.20)
2. copy the tar.gz files for both hadoop-core and hbase to ~/s3
(assuming ~/s3
is synchronized, manually or automatically, to your S3 account).
>> newimage = Himage.new :label => 'hbase-0.20.5-x86_64' Creating and registering image: hbase-0.20.5-x86_64 ...
>> cluster = HCluster.new :label => 'hbase-0.20.5-x86_64' => #<Hadoop::HCluster:0x1010e2098 @rs_key_name="root", ... >> cluster.launch [launch:zk.........................] [setup:zk.] [launch:master.......................] [setup:master...................................................] [launch:rs....................] [setup:rs:ec2-184-73-7-119.compute-1.amazonaws.com....................................] [setup:rs:ec2-184-73-12-72.compute-1.amazonaws.com....................................] [setup:rs:ec2-184-73-110-61.compute-1.amazonaws.com....................................] [setup:rs:ec2-75-101-180-6.compute-1.amazonaws.com...................................] [setup:rs:ec2-174-129-187-163.compute-1.amazonaws.com....................................] => "running" >> cluster.run_test("TestDFSIO -write -nrFiles 10 -fileSize 1000") TestFDSIO.0.0.4 (stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: nrFiles = 10 (stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000 ... 10/06/22 19:44:32 INFO mapred.FileInputFormat: IO rate std deviation: 1.0992092756403666 10/06/22 19:44:32 INFO mapred.FileInputFormat: Test exec time sec: 67.721 10/06/22 19:44:32 INFO mapred.FileInputFormat: => nil
>> cluster.terminate terminating zookeeper: i-5144a73b terminating master: i-9344a7f9 terminating regionserver: i-4d4aa927 terminating regionserver: i-434aa929 terminating regionserver: i-414aa92b terminating regionserver: i-474aa92d terminating regionserver: i-454aa92f => {"name"=>"hdfs", "num_zookeepers"=>1, "master"=>"i-9344a7f9", "launchTime"=>"2010-06-22T23:22:13.000Z", "num_regionservers"=>5, "dnsName"=>"ec2-184-73-16-65.compute-1.amazonaws.com", "state"=>"terminated"} >>