Friday, May 14, 2010

Minimal Katta Lucene Client

A quick post explaining how a minimal Katta Lucene Client is set up. I found this was sort of missing from the Katta site and documentation and since I ran into an issue along the way I thought I post my notes here for others who may attempt the same.

First was the question, which of the libs needed to be supplied for a client to use a remote Katta cluster. Please note that I am referring here to a "canonical" setup with a distributed Lucene index (which I created on Hadoop from data in HBase using a MapReduce job). I found these libs needed to be added, the rest is for the server:

katta-core-0.6.rc1.jar
lucene-core-3.0.0.jar
zookeeper-3.2.2.jar
zkclient-0.1-dev.jar
hadoop-core-0.20.1.jar
log4j-1.2.15.jar
commons-logging-1.0.4.jar

Here is the code for the client, please note that this is a simple test app that expects to get the name of the index, the default Lucene search field and query on the command line. I did not add usage info as this is just a proof of concept.

package com.worldlingo.test;

import net.sf.katta.lib.lucene.Hit;
import net.sf.katta.lib.lucene.Hits;
import net.sf.katta.lib.lucene.LuceneClient;
import net.sf.katta.util.ZkConfiguration;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Writable;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.util.Version;

import java.util.Arrays;
import java.util.Map;

public class KattaLuceneClient {

  public static void main(String[] args) {
    try {
      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
      Query query = new QueryParser(Version.LUCENE_CURRENT, args[1], analyzer).parse(args[2]);

      // assumes "/katta.zk.properties" available on classpath!
      ZkConfiguration conf = new ZkConfiguration();
      LuceneClient luceneClient = new LuceneClient(conf);
      Hits hits = luceneClient.search(query, Arrays.asList(args[0]).toArray(new String[1]), 99);

      int num = 0;
      for (Hit hit : hits.getHits()) {
        MapWritable mw = luceneClient.getDetails(hit);
        for (Map.Entry<Writable, Writable> entry : mw.entrySet()) {
          System.out.println("[" + (num++) + "] key -> " + entry.getKey() + ", value -> " + entry.getValue());
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

}

The first part is standard Lucene code were we parse the query string with an analyzer. The seconds part is Katta related as it creates a configuration object, which assumes we have a ZooKeeper configuration in the class path. That config only needs to have these lines set:

zookeeper.embedded=false
zookeeper.servers=server-1:2181,server-2:2181

The first line is really only used on the server, so it can be left out on the client. I simply copied the server katta.zk.properties to match my setup. The important line is the second one, which tells the client where the ZooKeeper responsible for managing the Katta cluster is running. With this info the client is able to distribute the search calls to the correct Katta slaves.

Further along we create a LuceneClient instance and start the actual search. Here I simply used no sorting and set the maximum number of hits returned to 99. These two values could be optionally added to the command line parameters but are trivial and not required here - this is a minimal test client after all ;)

The last part of the app is simply printing out the fields and their values of each found document. Please note that Katta is using the low-level Writable class as part of its response. This is not "too" intuitive for the uninitiated. These are actually Text instances so they can safely be convert to text using ".toString()".

Finally, I also checked the test project into my GitHub account for your perusal. Have fun!

1 comment:

  1. Nice Tutorial

    But, I am getting error with zookepper.

    following is the log


    11/05/18 11:10:34 INFO zookeeper.ClientCnxn:937 - Server connection successful
    11/05/18 11:10:34 INFO zkclient.ZkClient:434 - zookeeper state changed (SyncConnected)
    11/05/18 11:10:34 WARN zookeeper.ClientCnxn:967 - Exception closing session 0x12ffce416be008e to sun.nio.ch.SelectionKeyImpl@bfea1d
    java.io.IOException: Xid out of order. Got 198 expected -8
    at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:663)
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:719)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
    11/05/18 11: INFO zkclient.ZkClient:434 - zookeeper state changed (Disconnected)

    Please suggest something...

    ReplyDelete