How to set up site search using Solr
About Solr
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.
That means that to use Solr you will need to run in some Java servlet container.
Using solr-tomcat5.5
On Ubuntu et co we have this package available so install it and then do the following:
- Make sure the Catalina process uses UTF-8 for URI parsing
- Download the MidCOM Solr Schema and copy over
/etc/solr/conf/schema.xml
Using Jetty
We suggest you use Jetty as it is small, light weight and simple to use.
Dependencies
On Debian you need the following packages:
apt-get install libmx4j-java libregexp-java libsablevm-classlib1-java libservlet2.4-java libtomcat5.5-java libxerces2-java sun-java5-jdk
Installing Jetty
Solr comes with Jetty in the example directory.
Installing Solr
Download Solr from the Solr website http://www.apache.org/dyn/closer.cgi/lucene/solr/
Unpack solr and copy the jetty example install to the jetty home.
unzip apache-solr-1.1*.zip
cp -R apache-solr-1.1*/example /usr/share/jettyDownload the setupfiles for Solr. (Currently just for Debian)
svn co https://svn.midgard-project.org/midgard/trunk/external-tools/indexer-backends/solr
cd solr
bash ./install-solr.sh
This will install setupfiles in the correct places and set the correct permissions.
Start solr
/etc/init.d/jetty start
Now solr is running and listening to requests on port 8983.
Create a topic with the Search component.
In the menu, choose website -> website configuration.
Set the following values:
- Indexer: Solr
- Hostname of indexer xmltcp service: localhost (or the host solr is running on)
- Port of indexer xmltcp service: 8983
Reindex your site. Visit
/midcom-exec-midcom/reindex.php. This will take some time.You should now be able to run searches on your site.
Security
In the addListener definition in jetty.xml, add the following:
<Set name="Host">localhost</Set>
So Jetty doesn't listen to requests from the outside. If you want to still access the admin interface, use firewall scripts to hide the port from most users.
See http://wiki.apache.org/solr/SolrSecurity for more information.
Troubleshooting
"Authorization required"
When you're running midcom-exec-midcom/reindex.php, and you get "Authorization required" errors, you should modify the indexer_reindex_allowed_ips. Either set it in /etc/midgard/midcom.conf or in the host settings.
For the midcom.conf file, you need to add:
$GLOBALS['midcom_config_site']['indexer_reindex_allowed_ips'] = array('127.0.0.1','192.168.126.128','127.0.1.1');
"Indexer failed"
If you get an "indexer failed" error when reindexing the site, ensure that SOLR's data directory exists and is writable:
mkdir /usr/share/jetty/solr/data
chown jetty /usr/share/jetty/solr/data
