Open Source Content Management System

midcom.services.indexer installation

  1. Recommended Reading
  2. General Structure
  3. Setting up the Lucene Daemon
    1. Building and Installing the Daemon
    2. Running the Daemon
    3. Changing Configuration
  4. Configuring MidCOM to use the Indexer
    1. Activate the indexing feature in the MidCOM configuration
    2. Index your entire site
    3. Create the Indexer Frontend
  5. To make indexer run as a init.d script (on debian).

These are instructions for installing and setting up the MidCOM full-text search system.

Recommended Reading

General Structure

The Indexer is not directly indetrated into PHP. On one hand, the performance in a persistently running indexer daemon is (on average) better, than a fully integrated solution. On the other hand, there is just no really usable PHP-level On-Demand Indexer out there anyway (not that I would trust PHP far enough in this respect.

The structure of the index is further described in the mRFC 9. See there how Documents and Fields interact.

This document will focus on setting up and using the Indexer.

Setting up the Lucene Daemon

All required files of the current CVS state are available for Download on the required indexer files page, but if you want an up-to-date build of the system, follow these instructions:

Building and Installing the Daemon

  1. Go to the Lucene Website and download the latest Lucene binary tarball. In it you will find a file named lucene-$version.jar. Rename it to a plain lucene.jar. (Use version 1.4.3 found here http://apache.fi/lucene/java/archive/)
  2. Go to the external-tools/indexer-backends/lucene directory of the current MidCOM CVS. Copy the lucene.jar file into this directory and run make there, it will build a file named indexer.jar.
  3. Create a directory and copy the files lucene.jar, indexer.jar, xml-communication-request.dtd and xml-communication-response.dtd into it.

Running the Daemon

Go to the newly created directory, with an user account that has write permissions to this directory. Run:

/usr/bin/java -jar indexer.jar (or wherever you have java)

You should be fine from that point, the daemon will listen to 127.0.0.1:2222, which is the default setting from the MidCOM side too.

The Daemon will run in foreground by default, unless you launch it with some nohup wrapper. For the init-script (for debian), see "To make indexer run as a init.d script" section at the bottom of this page.

Changing Configuration

The daemon will take a filename during startup as first command line argument. A full configuration file looks like this:

logfile = 
loglevel = WARNING
bind = 127.0.0.1
port = 2222

What I wrote here are the defaults, log warning level messages to stderr (no log file) and bind to 127.0.0.1:2222. Check java.util.logging.Level for valid logging levels.

You should be fine using the defaults though.

Configuring MidCOM to use the Indexer

This is relativly easy, and consists of three tasks:

Activate the indexing feature in the MidCOM configuration

As usual, you will find all detailed information in the MidCOM API docs, section midcom_config.php. As long as you stick to the default configuration, it is enough to activate the XMLTCP indexing backend during MidCOM startup:

$GLOBALS['midcom_local_config']['indexer_backend'] = 'xmltcp';

Two more configuration options, indexer_xmltcp_host and indexer_xmltcp_port, allowing you to explicitly specify the host/port combination where the indexer runs.

Index your entire site

Unless you are building a site from scratch, you obviously have to reindex your entire website. This is done by accessing this URL with full admin privileges:

http://your.site.com/midcom-exec-midcom/reindex.php

Two important notes about this: First it will take quite some time, as the MidCOM side of the interface does not yet support batch indexing. Second, and far more important, Reindexing will take a huge amount of RAM. For example, a site with around 250 documents on it requires about 60 MB of total RAM.

Create the Indexer Frontend

Create a new topic using the component midcom.helper.search.

To make indexer run as a init.d script (on debian).

To make indexer run as a init.d script so you can use normal start & stop do the following steps. This is for Debian system. Tested in Sarge. You must perform these steps as root.

Create /usr/share/midgard/indexer/indexer in which you put the following code:

#! /bin/sh
#
# indexer

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="MidCOM indexer"
NAME=java-indexer
DAEMON=/usr/share/midgard/indexer/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
ARGS="-jar indexer.jar"
APPDIR=/usr/share/midgard/indexer

# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0

# Read config file if it is present.
#if  -r /etc/default/$NAME 
#then
#   . /etc/default/$NAME
#fi

#
#   Function that starts the daemon/service.
#
d_start() {
    start-stop-daemon --start --make-pidfile --pidfile $PIDFILE \
        --chdir $APPDIR --background --quiet \
        --exec $DAEMON -- $ARGS
}

#
#   Function that stops the daemon/service.
#
d_stop() {
    start-stop-daemon --stop --quiet --pidfile $PIDFILE
}

#
#   Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
    start-stop-daemon --stop --quiet --pidfile $PIDFILE --signal 1
}

case "$1" in
  start)
    echo -n "Starting $DESC: $NAME"
    d_start
    echo "."
    ;;
  stop)
    echo -n "Stopping $DESC: $NAME"
    d_stop
    echo "."
    ;;
  #reload)
    #
    #   If the daemon can reload its configuration without
    #   restarting (for example, when it is sent a SIGHUP),
    #   then implement that here.
    #
    #   If the daemon responds to changes in its config file
    #   directly anyway, make this an "exit 0".
    #
    # echo -n "Reloading $DESC configuration..."
    # d_reload
    # echo "done."
  #;;
  restart|force-reload)
    #
    #   If the "reload" option is implemented, move the "force-reload"
    #   option to the "reload" entry above. If not, "force-reload" is
    #   just the same as "restart".
    #
    echo -n "Restarting $DESC: $NAME"
    d_stop
    sleep 1
    d_start
    echo "."
    ;;
  *)
    # echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
    echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
    exit 1
    ;;
esac

exit 0

If you do not already have PEAR XML_Parser, you need to also get that:

pear install xml_parser

Copy indexer.jar, xml-communication-request.dtd, xml-communication-response.dtd, lucene.jar in /usr/share/midgard/indexer/

Make sure you have Java runtime installed. Create symlink so that /usr/share/midgard/indexer/java-indexer points to java.

ln -s /opt/j2re/bin/java /usr/share/midgard/indexer/java-indexer

The process we will be running will then be called java-indexer so this wont mess up with other java processes.

Create symlink:

ln -s /usr/share/midgard/indexer/indexer /etc/init.d/indexer

To make it automatically start and stop on startup/shutdown:

update-rc.d indexer defaults

Have fun indexing!

Designed by Nemein, hosted by Anykey