Skip to content
ekoontz edited this page Sep 14, 2010 · 12 revisions

Introduction

hbase-ec2 is a Ruby library to help manage a set of Amazon EC2 instances as a single HBase cluster.

Contents

hbase-ec2 is currently supplied as a set of ruby files that currently is :

  • lib/hcluster.rb : the Hadoop::HCluster and Hadoop::HImage class definitions
  • lib/TestDFSIO.rb : a subclass of HCluster, intended as an example for testing Hadoop filesystem functionality.

Prerequisites

  • export RUBYOPT="rubygems"
  • AWS::EC2. You can install this with gem install amazon-ec2.
  • AWS::S3. You can install this with gem install aws-s3.
  • Net::SSH. You can install this with gem install net-ssh.
  • Net::SCP. You can install this with gem install net-scp.
  • OpenSSL support for Ruby. This might also be installed with your ruby, but on Ubuntu, I had to do: apt-get install libruby-extras.
  • An Amazon EC2 account. You must add the following to your environment prior to starting irb:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_ACCOUNT_ID=...
  • A EC2 key pair called “root”. This should be stored in your home directory in ~/.ec2/root.pem.

Optional configuration

You can set your preferred EC2 region with the EC2_URL environment variable; for example:
pre.
export EC2_URL=“http://ec2.us-west-1.amazonaws.com”

By default, the value “https://ec2.amazonaws.com” will be used. You can see a complete list of available regions by using the ec2-describe-regions function (see Amazon’s Region and Availability Zone FAQ).

Downloading hbase-ec2

git clone git://github.com/ekoontz/hbase-ec2.git

Usage

Preliminaries

$ irb
>> $:.unshift("~/hbase-ec2/lib")
=> ["~/hbase-ec2/lib", ...]
>> load 'hcluster.rb'
=> true
>> include Hadoop
=> Object

Creating an image from hadoop-core and hbase source trees

1. run ant tar (0.20-era):

ekoontz@localhost:~/hbase$ git branch -a
* tags/0.20.5
..
ekoontz@localhost:~/hbase$ ant clean tar
..
tar:
      [tar] Building tar: /home/ekoontz/hbase/build/hbase-0.20.5.tar.gz
BUILD SUCCESSFUL
Total time: 48 seconds
ekoontz@localhost:~/hbase$ 
or mvn -DskipTests assembly:assembly (post 0.20)

2. copy the tar.gz files for both hadoop-core and hbase to ~/s3 (assuming ~/s3 is synchronized, manually or automatically, to your S3 account).

>> newimage = Himage.new :label => 'hbase-0.20.5-x86_64'
Creating and registering image: hbase-0.20.5-x86_64
...

Starting a new Amazon HBase cluster

>> cluster = HCluster.new :label => 'hbase-0.20.5-x86_64'
=> #<Hadoop::HCluster:0x1010e2098 @rs_key_name="root",
...
>> cluster.launch
[launch:zk.........................]
[setup:zk.]
[launch:master.......................]
[setup:master...................................................]
[launch:rs....................]
[setup:rs:ec2-184-73-7-119.compute-1.amazonaws.com....................................]
[setup:rs:ec2-184-73-12-72.compute-1.amazonaws.com....................................]
[setup:rs:ec2-184-73-110-61.compute-1.amazonaws.com....................................]
[setup:rs:ec2-75-101-180-6.compute-1.amazonaws.com...................................]
[setup:rs:ec2-174-129-187-163.compute-1.amazonaws.com....................................]
=> "running"
>> cluster.run_test("TestDFSIO -write -nrFiles 10 -fileSize 1000")
TestFDSIO.0.0.4
(stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: nrFiles = 10
(stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000
...
10/06/22 19:44:32 INFO mapred.FileInputFormat:  IO rate std deviation: 1.0992092756403666
10/06/22 19:44:32 INFO mapred.FileInputFormat:     Test exec time sec: 67.721
10/06/22 19:44:32 INFO mapred.FileInputFormat: 
=> nil

Terminating a Cluster

>> cluster.terminate
terminating zookeeper: i-5144a73b
terminating master: i-9344a7f9
terminating regionserver: i-4d4aa927
terminating regionserver: i-434aa929
terminating regionserver: i-414aa92b
terminating regionserver: i-474aa92d
terminating regionserver: i-454aa92f
=> {"name"=>"hdfs", "num_zookeepers"=>1, "master"=>"i-9344a7f9", "launchTime"=>"2010-06-22T23:22:13.000Z", "num_regionservers"=>5, "dnsName"=>"ec2-184-73-16-65.compute-1.amazonaws.com", "state"=>"terminated"}
>> 
Clone this wiki locally