A Brief and Incomplete Catalog of Static Site Search Options

Published

I was actually going to title this, “A Brief and Incomplete Catalog of Static Site Search Options, None of Which You Will Love”, but I thought that would be too long.

This is a list of options I’ve found for adding search functionality to this web site, which is to say a small, statically-generated site that is a hobby and doesn’t produce any revenue. I mention that bit about no revenue because if this was more than just a personal site, and particularly if it were a site that makes money for me, I’d have no problem plunking down $40/month or so to get some kind of hosted search, or even just hosting something like Apache Solr myself.

To reiterate, I’m looking for “full text search of my little web site”. A lot of these providers, such as Algolia, offer a lot more functionality than the “type a query, get some links back” that I’m looking for, so you are hereby warned about my bias in evaluating the offerings below.

These people host a search service for you. You find a way to shoot your content up to them, and then you shoot queries to them via JavaScript. Of the ones below, I think Google Cloud Search API, Para, and Algolia, are probably the stand out offerings, in that order. However, I’ve only really used Algolia and Google Custom Search Engine.

Let me note that it seems like several hosted search offerings have gone away: FacetFlow (closed), Tapir (gone?), Firebase (docs tell you to use Algolia?). This might be something worth keeping in mind if you intend to invest time in learning and implementing one of these hosted solutions.

Host Server-Side Searching Yourself

I know this isn’t really in the spirit of static web sites, but if you need search, maybe you need to run some search software on your server.

I did not do a ton of research into this area. The names that immediately come to mind are ElasticSearch, Apache Solr, and Lucene. I know next to nothing about any of those except that I’m pretty sure ElasticSearch and Solr are both built on Lucene so you’d probably want to start there. Of course there’s always good old PostgreSQL full-text search with some (hopefully trivial) front end on it. I stumbled into bleve, who I think have some kind of search server you can run, but I’m not actually certain about that.

I’m sure there are many others.

Client-Side Searching

I’m not terribly fond of the idea of sending my whole site index down to the client every time they load my page. It seems more than a bit rude, especially for people on metered Internet connections. Jakub Chodounský did, however, point out that his entire blog’s index is, “about 260kb [sic] before gzipping and 90kb after which is less than one nice image.” He also points out that the search is very fast, though I have to wonder if users on underpowered devices might disagree about that speed. Actually, go do a search on the SQLAlchemy documentation, which I believe is done on the client. It takes a surprisingly long time, and that’s on my MacBook Pro with Chrome, far from “underpowered” in any common use of the term.

But if you’re OK with sticking clients with your whole search index and heating up their laps/hands while you do the search, then you have options. Lunr is a JavaScript project which is, “A bit like Solr, but much smaller and not as bright.” Hugo, my static site generator of choice as I write this, has a few links on using Lunr (and others) to add search to your static site. In fact, that page will give you a few other options as well, such as Fuse.js.

Don’t Forget Indexing

You’ll need some kind of way to actually index your site. That’s going to depend on how your search works. If it’s Google CSE, for example, I think you index your site by waiting patiently. As I mentioned above, with Algolia I used a Hugo template to make a JSON file that I sent up to Algolia with atomic-algolia. Some software, like maybe Lunr or Solr, may come with its own tools to run through your content files.

There’s no single answer here, but I didn’t want you to forget, dear reader, of the need to actually get the content into whatever format your searching solution needs.