Friday, November 20, 2009

HBase on Cloudera Training Virtual Machine (0.3.2)

Note: This is a follow up to my earlier post. Since then Cloudera released a new VM that includes the current 0.20 branch of Hadoop. Below I have the same post adjusted to work with that new release. Please note that there are subtle changes, for example the NameNode port has changed. So if you in any way still have the older post please make sure you forget about it and follow this one here instead for the new VM version.

You might want to run HBase on Cloudera's Virtual Machine to get a quick start to a prototyping setup. In theory you download the VM, start it and you are ready to go. The main issue though is that the current Hadoop Training VM does not include HBase at all (yet?). Apart from that the install of a local HBase instance is a straight forward process.

Here are the steps to get HBase running on Cloudera's VM:
  1. Download VM

    Get it from Cloudera's website.

  2. Start VM

    As the above page states: "To launch the VMWare image, you will either need VMware Player for windows and linux, or VMware Fusion for Mac."

    Note: I have Parallels for Mac and wanted to use that. I used Parallels Transporter to convert the "cloudera-training-0.3.2.vmx" to a new "cloudera-training-0.2-cl4-000001.hdd", create a new VM in Parallels selecting Ubuntu Linux as the OS and the newly created .hdd as the disk image. Boot up the VM and you are up and running. I gave it a bit more memory for the graphics to be able to switch the VM to 1440x900 which is the native screen resolution on my MacBook Pro I am using.

    Finally follow the steps explained on the page above, i.e. open a Terminal and issue:
    $ cd ~/git
    $ ./update-exercises --workspace

  3. Pull HBase branch

    We are using the brand new HBase 0.20.2 release. Open a new Terminal (or issue a $ cd .. in the open one), then:
    $ sudo -u hadoop git clone /home/hadoop/hbase
    $ sudo -u hadoop sh -c "cd /home/hadoop/hbase ; git checkout origin/tags/0.20.2"
    Note: moving to "origin/tags/0.20.2" which isn't a local branch
    If you want to create a new branch from this checkout, you may do so
    (now or later) by using -b with the checkout command again. Example:
      git checkout -b <new_branch_name>
    HEAD is now at 777fb63... HBase release 0.20.2

    First we clone the repository, then switch to the actual branch. You will notice that I am using sudo -u hadoop because Hadoop itself is started under that account and so I wanted it to match. Also, the default "training" account does not have SSH set up as explained in Hadoop's quick-start guide. When sudo is asking for a password use the default, which is set to "training".

    You can ignore the messages git prints out while performing the checkout.

  4. Build Branch

    Continue in Terminal:
    $ sudo -u hadoop sh -c "cd /home/hadoop/hbase/ ; export PATH=$PATH:/usr/share/apache-ant-1.7.1/bin ; ant package"

  5. Configure HBase

    There are a few edits to be made to get HBase running.
    $ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-site.xml
    $ sudo -u hadoop vim /home/hadoop/hbase/build/conf/ 
    # The java implementation to use.  Java 1.6 required.
    # export JAVA_HOME=/usr/java/jdk1.6.0/
    export JAVA_HOME=/usr/lib/jvm/java-6-sun

  6. Rev up the Engine!

    The final thing is to start HBase:
    $ sudo -u hadoop /home/hadoop/hbase/build/bin/
    $ sudo -u hadoop /home/hadoop/hbase/build/bin/hbase shell
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Version: 0.20.2, r777fb63ff0c73369abc4d799388a45b8bda9e5fd, Thu Nov 19 15:32:17 PST 2009


    Let's create a table and check if it was created OK.
    hbase(main):001:0> list
    0 row(s) in 0.0910 seconds
    hbase(main):002:0> create 't1', 'f1', 'f2', 'f3'
    0 row(s) in 6.1260 seconds
    hbase(main):003:0> list                         
    1 row(s) in 0.0470 seconds
    hbase(main):004:0> describe 't1'                
    DESCRIPTION                                                             ENABLED                               
     {NAME => 't1', FAMILIES => [{NAME => 'f1', COMPRESSION => 'NONE', VERS true                                  
     IONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => '                                       
     false', BLOCKCACHE => 'true'}, {NAME => 'f2', COMPRESSION => 'NONE', V                                       
     ERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =                                       
     > 'false', BLOCKCACHE => 'true'}, {NAME => 'f3', COMPRESSION => 'NONE'                                       
     , VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR                                       
     Y => 'false', BLOCKCACHE => 'true'}]}                                                                        
    1 row(s) in 0.0750 seconds
This sums it up. I hope you give HBase on the Cloudera Training VM a whirl as it also has Eclipse installed and therefore provides a quick start into Hadoop and HBase.

Just keep in mind that this is for prototyping only! With such a setup you will only be able to insert a handful of rows. If you overdo it you will bring it to its knees very quickly. But you can safely use it to play around with the shell to create tables or use the API to get used to it and test changes in your code etc.

Finally a screenshot of the running HBase UI:


  1. I just wanted to observe that the VM appliance shipped by Cloudera also opens as expected in VM Fusion, and works as expected, and very well, there, too.

  2. Super helpful walkthrough Lars! Thanks a lot...