Friday, February 5, 2010

IvyDE and HBase 0.21 (trunk)

If you are staying on top of HBase development and frequently update to the HBase trunk (0.21.0-dev at the time of this post) you may have noticed that we now have support for Apache Ivy (see HBASE-1433). This is good because it allows to better control dependencies of the required jar files. It does have a few drawbacks though. One issue that you must be online to get your initial set of jars. You can also set up a local mirror or reuse the one you need for Hadoop anyways to share some of them.

Another issue is that it pulls in many more libs as part of the dependency resolving process. This reminds me bit of aptitude and when you try to install Java, for example on Debian. It often wants to pull in a litany of "required" packages but upon closer look many are only recommended and need not to be installed.

Finally you need to get the jar files somehow into your development environment. I am using Eclipse 3.5 on my Windows 7 PC as well as on my MacOS machines. If you have not run ant from the command line yet you have no jars downloaded and opening the project in Eclipse yields an endless amount of errors. You have two choices, you can run ant and get all jars and then add them to the project in Eclipse. But that is rather static and does not work well with future changes. It also is not the "ivy" way to resolve the libraries automatically.

The other option you have is adding a plugin to Eclipse that can handle Ivy for you, right within the IDE. Luckily for Eclipse there is IvyDE. You install it according to its documentation and then add a "Classpath Container" as described here. That part works quite well and after a restart IvyDE is ready to go.

A few more steps have to be done to get HBase working now - as in compiling without errors. The crucial one is editing the Ivy library and setting the HBase specific Ivy files. In particular the "Ivy settings path" and the properties file. The latter especially is specifying all the various version numbers that the ivy.xml is using. Without it the Ivy resolve process will fail with many errors all over the place. Please note that in the screen shot I added you see how it looks like on my Windows PC. The paths will be slightly different for your setup and probably even using another format if you are on a Mac or Linux machine. As long as you specify both you should be fine.

The other important issue is that you have to repeat that same step adding the Classpath Container two more times: each of the two larger contrib packages "contrib/stargate" and "contrib/transactional" have their own ivy.xml! For both you have to go into the respective directory and right click on the ivy.xml and follow the steps described in the Ivy documentation. Enter the same information as mentioned above to make the resolve work, leave everything else the way it is. You may notice that the contrib packages have a few more targets unticked. That is OK and can be used as-is.

As a temporary step you have to add two more static libraries that are in the $HBASE_HOME/lib directory: libthrift-0.2.0.jar and zookeeper-3.2.2.jar. Those will eventually be published on the Ivy repositories and then this step is obsolete (see INFRA-2461).

Eventually you end up with three containers as shown in the second and third screen shot. The Eclipse toolbar now also has an Ivy "Resolve All Dependencies" button which you can use to trigger the download process. Personally I had to do this a few times as the mirrors with the jars seem to be flaky at times. I ended up with for example "hadoop-mapred.jar" missing. Another resolve run fixed the problem.

The last screen shot shows the three Ivy related containers once more in the tree view of the Package Explorer in the Java perspective. What you also see is the Ivy console, which also is installed with the plugin. You have to open it as usual using the "Window - Show View - Console" menu (if you do not have the Console View open already) and then use the drop down menu next to the "Open Console" button in that view to open the Ivy console. It gives you access to all the details when resolving the dependencies and can hint when you have done something wrong. Please note though that it also lists a lot of connection errors, one for every mirror or repository that does not respond or yet has the required package available. One of them should respond though or as mentioned above you will have to try later again.

Eclipse automatically compiles the project and if everything worked out it does so now without a hitch. Good luck!

Update: Added info about the yet still static thrift and zookeeper jars. See Kay Kay's comment below.

4 comments:

  1. Nice post. Once INFRA-2461 is set up , depending on the consensus, it is possible to publish thrift as an artifact internally within hbase and then get away from giving special status to zk/ thrift altogether .

    ReplyDelete
  2. This is crazy. If you are going down the dependency path switch to Maven which integrates well with all the IDEs rather than this half-way solution to the problem.

    ReplyDelete
  3. Sam, Maven is next as far as I understand (see the JIRA issues related to it).

    ReplyDelete
  4. Cool. We should probably call that out here then to make it clear that this is the first phase of a transition to Maven rather than the end goal. I'm all for it. Nothing makes it easier on the customer than making the dependency management and IDE integration easier.

    ReplyDelete