hbase-ec2 is a Ruby library to help manage a set of Amazon EC2 instances as a single HBase cluster.
hbase-ec2 is currently supplied as a set of ruby files that currently is :
- lib/hcluster.rb : the Hadoop::HCluster and Hadoop::HImage class definitions
- lib/TestDFSIO.rb : a subclass of HCluster, intended as an example for testing Hadoop filesystem functionality.
export RUBYOPT="rubygems"
- AWS::EC2. You can install this with
gem install amazon-ec2
. - AWS::S3. You can install this with
gem install aws-s3
. - Net::SSH. You can install this with
gem install net-ssh
. - Net::SCP. You can install this with
gem install net-scp
. - OpenSSL support for Ruby. This might also be installed with your ruby, but on Ubuntu, I had to do:
apt-get install libruby-extras
. - An Amazon EC2 account. You must add the following to your environment prior to starting irb:
export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... export AWS_ACCOUNT_ID=...
- A EC2 key pair called “root”. This should be stored in your home directory in
~/.ec2/root.pem
.
You can set your preferred EC2 region with the EC2_URL environment variable; for example:
export EC2_URL="http://ec2.us-west-1.amazonaws.com"
By default, https://ec2.amazonaws.com
will be used. You can see a complete list of available regions by using the ec2-describe-regions
function (see Amazon’s Region and Availability Zone FAQ).
git clone git://github.com/ekoontz/hbase-ec2.git
$ irb >> $:.unshift("~/hbase-ec2/lib") => ["~/hbase-ec2/lib", ...] >> load 'hcluster.rb' => true >> include Hadoop => Object
See: Himage Usage
>> cluster = HCluster.new :label => 'hbase-0.20.5-x86_64' => #<Hadoop::HCluster:0x1010e2098 @rs_key_name="root", ... >> cluster.launch [launch:zk.........................] [setup:zk:ec2-184-73-5-47.compute-1.amazonaws.com...........] [launch:master.......................] [setup:master:ec2-184-73-53-56.compute-1.amazonaws.com...................................................] [launch:rs....................] [setup:rs:ec2-184-73-7-119.compute-1.amazonaws.com....................................] [setup:rs:ec2-184-73-12-72.compute-1.amazonaws.com....................................] [setup:rs:ec2-184-73-110-61.compute-1.amazonaws.com....................................] [setup:rs:ec2-75-101-180-6.compute-1.amazonaws.com...................................] [setup:rs:ec2-174-129-187-163.compute-1.amazonaws.com....................................] => "running" >> cluster.run_test("TestDFSIO -write -nrFiles 10 -fileSize 1000") TestFDSIO.0.0.4 (stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: nrFiles = 10 (stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000 ... 10/06/22 19:44:32 INFO mapred.FileInputFormat: IO rate std deviation: 1.0992092756403666 10/06/22 19:44:32 INFO mapred.FileInputFormat: Test exec time sec: 67.721 10/06/22 19:44:32 INFO mapred.FileInputFormat: => nil
>> cluster.terminate terminating zookeeper: i-5144a73b terminating master: i-9344a7f9 terminating regionserver: i-4d4aa927 terminating regionserver: i-434aa929 terminating regionserver: i-414aa92b terminating regionserver: i-474aa92d terminating regionserver: i-454aa92f => {"name"=>"hdfs", "num_zookeepers"=>1, "master"=>"i-9344a7f9", "launchTime"=>"2010-06-22T23:22:13.000Z", "num_regionservers"=>5, "dnsName"=>"ec2-184-73-16-65.compute-1.amazonaws.com", "state"=>"terminated"} >>