Skip to content
ekoontz edited this page Sep 14, 2010 · 12 revisions

Introduction

hbase-ec2 is a Ruby library to help manage a set of Amazon EC2 instances as a single HBase cluster.

Contents

hbase-ec2 is currently supplied as a set of ruby files that currently is :

  • lib/hcluster.rb : the Hadoop::HCluster and Hadoop::HImage class definitions
  • lib/TestDFSIO.rb : a subclass of HCluster, intended as an example for testing Hadoop filesystem functionality.

Prerequisites

  • export RUBYOPT="rubygems"
  • AWS::EC2. You can install this with gem install amazon-ec2.
  • AWS::S3. You can install this with gem install aws-s3.
  • Net::SSH. You can install this with gem install net-ssh.
  • Net::SCP. You can install this with gem install net-scp.
  • OpenSSL support for Ruby. This might also be installed with your ruby, but on Ubuntu, I had to do: apt-get install libruby-extras.
    As the Puppet Installation docs write:

You can test for it by running ‘ruby -ropenssl -e “puts :yep”’. If that errors out, you’re missing the library.

  • An Amazon EC2 account. You must add the following to your environment prior to starting irb:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_ACCOUNT_ID=...
  • A EC2 key pair called “root”. This should be stored in your home directory in ~/.ec2/root.pem.

Optional configuration

You can set your preferred EC2 region with the EC2_URL environment variable; for example:

export EC2_URL="http://ec2.us-west-1.amazonaws.com"

By default, https://ec2.amazonaws.com will be used. You can see a complete list of available regions by using the ec2-describe-regions function (see Amazon’s Region and Availability Zone FAQ).

Downloading hbase-ec2

git clone git://github.com/ekoontz/hbase-ec2.git

Usage

Preliminaries

$ irb
>> $:.unshift("~/hbase-ec2/lib")
=> ["~/hbase-ec2/lib", ...]
>> load 'hcluster.rb'
=> true
>> include Hadoop
=> Object

Creating an image from hadoop-core and hbase source trees

See: Himage Usage

Starting a new Amazon HBase cluster

>> cluster = HCluster.new :label => 'hbase-0.20.5-x86_64'
=> #<Hadoop::HCluster:0x1010e2098 @rs_key_name="root",
...
>> cluster.launch
[launch:zk.........................]
[setup:zk:ec2-184-73-5-47.compute-1.amazonaws.com...........]
[launch:master.......................]
[setup:master:ec2-184-73-53-56.compute-1.amazonaws.com...................................................]
[launch:rs....................]
[setup:rs:ec2-184-73-7-119.compute-1.amazonaws.com....................................]
[setup:rs:ec2-184-73-12-72.compute-1.amazonaws.com....................................]
[setup:rs:ec2-184-73-110-61.compute-1.amazonaws.com....................................]
[setup:rs:ec2-75-101-180-6.compute-1.amazonaws.com...................................]
[setup:rs:ec2-174-129-187-163.compute-1.amazonaws.com....................................]
=> "running"
>> cluster.run_test("TestDFSIO -write -nrFiles 10 -fileSize 1000")
TestFDSIO.0.0.4
(stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: nrFiles = 10
(stderr): 10/06/22 19:43:24 INFO mapred.FileInputFormat: fileSize (MB) = 1000
...
10/06/22 19:44:32 INFO mapred.FileInputFormat:  IO rate std deviation: 1.0992092756403666
10/06/22 19:44:32 INFO mapred.FileInputFormat:     Test exec time sec: 67.721
10/06/22 19:44:32 INFO mapred.FileInputFormat: 
=> nil

Terminating a Cluster

>> cluster.terminate
terminating zookeeper: i-5144a73b
terminating master: i-9344a7f9
terminating regionserver: i-4d4aa927
terminating regionserver: i-434aa929
terminating regionserver: i-414aa92b
terminating regionserver: i-474aa92d
terminating regionserver: i-454aa92f
=> {"name"=>"hdfs", "num_zookeepers"=>1, "master"=>"i-9344a7f9", "launchTime"=>"2010-06-22T23:22:13.000Z", "num_regionservers"=>5, "dnsName"=>"ec2-184-73-16-65.compute-1.amazonaws.com", "state"=>"terminated"}
>> 
Clone this wiki locally