"Young businessman with the spyglass in the wheat field searching for the new opportunities" by rangizzz; http://www.shutterstock.com/pic-230182939.html

Solr search for Dovecot and WordPress

Perhaps the most-significant effect of leaving Gmail behind was the loss of its search capabilities. While I miss labels, I’ve found that filing an email into a single folder has forced me to be more deliberate, more organized. Search, however, was a feature I had to replicate.

When considering search solutions, any potential choice, at a minimum, needed to support Dovecot 2.21. Ideally, WordPress would also be indexed by whatever solution I chose.

With Dovecot 2.2 as the constraining factor, I had two choices for search: Solr or Lucene. Of the two, Solr has an actively-maintained WordPress plugin, making it the only choice2. That simplified matters considerably. 😁

I won’t describe setting up Solr 43 in this post; sufficient tutorials exist for that. Likely the only unique detail about my setup is the use of an nginx proxy.

Proxying search traffic via nginx

My Solr 4 instance is powered by Tomcat 7, for no particular reason other than I could add SSL without converting certificates to PKCS and inserting them into a keystore. For several reasons, notably performance and security, I’ve restricted Tomcat to listen just for local connections. All public requests are forwarded by nginx via its ngx_http_proxy_module module.

By proxying all Solr traffic4, nginx continues to handle SSL communication, alleviating the need to replicate HSTS and other security features in Tomcat. Additionally, by presenting nginx as the only public-facing entry to Solr and Tomcat, I’ve restricted one’s ability to exploit vulnerabilities in either of these Java applications.

Another benefit of proxying all Solr traffic is the ability to rewrite requests. By modifying the request path, specific endpoints are exposed whilst common exploits are defeated. I’m also better able to cache requests by using a structure suited to microcaching.

Preparing Solr Configurations

Of everything involved with using Solr for Dovecot and WordPress, creating the proper Solr configurations was, by far, the most-difficult part, far beyond any challenge I faced with Tomcat. During my first attempt, configuring Solr indices proved so difficult that I conceded defeat, coming back several months later in determined frustration.

What I failed to notice the first time was that in addition to the schema.xml that the Solr integrations provided, there are many additional files required to create a working index. I also overlooked that Solr provides them all, and that I just needed to merge those defaults with the schema that I was attempting to configure.

Once you’ve installed Solr, its default data directory will include a sample configuration. Within that default configuration’s directory (located at /var/lib/tomcat7/solr/collection1 in my installation) is a conf directory that contains the defaults I just mentioned.

For each Solr index you need, you’ll:

  • create a new directory where Solr expects it5;
  • replicate the default conf folder into the new index directory; and
  • replace schema.xml in your new index’s conf directory, along with anything else your integration provides.

Merging the default conf directory with your integration’s schema.xml and other configurations is important because Solr will otherwise fail to properly index your content. Awkwardly, it won’t always fail outright–or necessarily in a logged way–if these defaults are missing; poor results may be the only evidence of this failure.

After creating an index, use the Solr GUI to explore the indices it serves, confirming that those you’ve added are present and contain some data. It is possible, even likely, that you’ll need to manually build, or prime, the first index.

At this point, you can configure your applications’ Solr integrations to use the indices you’ve created.

My Solr Configurations

For reference, my Solr 4 configurations are available for download:

Vendor Configurations

Young businessman with the spyglass in the wheat field searching for the new opportunities” by rangizz, used with permission.

  1. Full-text search options changed in version 2.2, hence my emphasis on that particular point release. Dovecot Pro, which I don’t pay for, includes a new full-text search tool, which supersedes the option Dovecot provided previously.
  2. The Lucene plugin hasn’t been updated since October 2011
  3. I chose Solr 4 because it runs in Tomcat, rather than as a standalone webapp, and Tomcat proved easier to implement SSL for. As I noted in Proxying search traffic via nginx, this reasoning proved invalid.
  4. Tomcat serves nothing but Solr.
  5. /var/lib/tomcat7/solr/NEW_INDEX in my case, where NEW_INDEX is a descriptive name for internal purposes. The NEW_INDEX value won’t be exposed to end-users by the Solr server; instead, when configuring the index in Solr, you’ll specify the public-facing URL where the index accepts queries.