Here are the steps to get HBase running on Cloudera's VM:
- Download VM
Get it from Cloudera's website.
- Start VM
As the above page states: "To launch the VMWare image, you will either need VMware Player for windows and linux, or VMware Fusion for Mac."
Note: I have Parallels for Mac and wanted to use that. I used Parallels Transporter to convert the "cloudera-training-0.3.1.vmx" to a new "cloudera-training-0.2-cl3-000002.hdd", create a new VM in Parallels selecting Ubuntu Linux as the OS and the newly created .hdd as the disk image. Boot up the VM and you are up and running. I gave it a bit more memory for the graphics to be able to switch the VM to 1440x900 which is native to my MacBook Pro I am using.
Finally follow the steps explained on the page above, i.e. open a Terminal and issue:
$ cd ~/git $ ./update-exercises --workspace
- Pull HBase branch
Open a new Terminal (or issue a
$ cd ..in the open one), then:
$ sudo -u hadoop git clone http://git.apache.org/hbase.git /home/hadoop/hbase $ sudo -u hadoop sh -c "cd /home/hadoop/hbase ; git checkout origin/0.20_on_hadoop-0.18.3" ... HEAD is now at c050f68... pull up to release
First we clone the repository, then switch to the actual branch. You will notice that I am using
sudo -u hadoopbecause Hadoop itself is started under that account and so I wanted it to match. Also, the default "training" account does not have SSH set up as explained in Hadoop's quick-start guide. When
sudois asking for a password use the default set to "training".
- Build Branch
Continue in Terminal:
$ sudo -u hadoop sh -c "cd /home/hadoop/hbase/ ; export PATH=$PATH:/usr/share/apache-ant-1.7.1/bin ; ant package" ... BUILD SUCCESSFUL
- Configure HBase
There are a few edits to be made to get HBase running.
$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-site.xml <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:8020/hbase</value> </property> </configuration> $ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-env.sh # The java implementation to use. Java 1.6 required. # export JAVA_HOME=/usr/java/jdk1.6.0/ export JAVA_HOME=/usr/lib/jvm/java-6-sun ...
Note: There is a small glitch in the revision 826669 of that Cloudera specific HBase branch. The master UI (on port 60010 on localhost) will not start because a path is different and Jetty packages are missing because of it. You can fix it by editing the start up script and changing the path scanned:
$ sudo -u hadoop vim /home/hadoop/hbase/build/bin/hbase
for f in $HBASE_HOME/lib/jsp-2.1/*.jar; do
for f in $HBASE_HOME/lib/jetty-ext/*.jar; do
This is only until the developers have fixed this in the branch (compare the revision I used r813052 with what you get). Or if you do not want the UI you can ignore this and the error in the logs too. HBase will still run, just not its web based interface.
- Rev up the Engine!
The final thing is to start HBase:
$ sudo -u hadoop /home/hadoop/hbase/build/bin/start-hbase.sh $ sudo -u hadoop /home/hadoop/hbase/build/bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Version: 0.20.0-0.18.3, r813052, Mon Oct 19 06:51:57 PDT 2009 hbase(main):001:0> list 0 row(s) in 0.2320 seconds hbase(main):002:0>
This sums it up. I hope you give HBase on the Cloudera Training VM a whirl as it also has Eclipse installed and therefore provides a quick start into Hadoop and HBase.
Just keep in mind that this is for prototyping only! With such a setup you will only be able to insert a handful of rows. If you overdo it you will bring it to its knees very quickly. But you can safely use it to play around with the shell to create tables or use the API to get used to it and test changes in your code etc.
Update: Updated title to include version number, fixed XML