I was actually going to title this, “A Brief and Incomplete Catalog of Static Site Search Options, None of Which You Will Love”, but I thought that would be too long.
This is a list of options I’ve found for adding search functionality to this web site, which is to say a small, statically-generated site that is a hobby and doesn’t produce any revenue. I mention that bit about no revenue because if this was more than just a personal site, and particularly if it were a site that makes money for me, I’d have no problem plunking down $40/month or so to get some kind of hosted search, or even just hosting something like Apache Solr myself.
To reiterate, I’m looking for “full text search of my little web site”. A lot of these providers, such as Algolia, offer a lot more functionality than the “type a query, get some links back” that I’m looking for, so you are hereby warned about my bias in evaluating the offerings below.
Algolia: Easy to use with a good API. Very fast. Unfortunately I hit the 10 KB limit on records almost immediately: any blog article of reasonable length can hit this easily. I would probably be willing to pay $2/month to get that limit lifted, but unfortunately their paid plans start at $35/month which is way too much for a hobby site. If you can live with this limit, though, Algolia makes search real easy. Algolia users can take a look at atomic-algolia for a tool to update your index.
Para: I’ve never heard of Para, but they look like some kind of hosted service backed by Lucene. They have a free plan that advertises one “application”, 1 GB storage, 1 read/s, 1 write/s. At face value that sounds like enough. They have a nice-looking article on setting up static site search using Para. Their API doesn’t look as simple as I was hoping, but I’m sure I could figure it out. Hey, look at this: Para looks like it is completely open source. Maybe that’s a good thing, because I have to admit that I’ve never heard of this service or its backing company, and so I’m naturally worried about implementing it on my site only to have them close shop the next day. This service may bear some further evaluation.
Google Cloud Engine Search API: What is this? I mean, I’m having a little trouble figuring out what to call this other than “Google Cloud Engine Search API”. Is that it? I need to look further into how this works, but Simo Ahava has written an article about adding search to a Hugo site with Google Search API, which is exactly what I want to do. They have a free tier with 250 GB storage (documents plus indexes), 1,000 queries/day, and 10 MB of “adding documents to indexes” per day. That sounds pretty passable, and if you think you might exceed those limits the incremental charges also sound reasonable. Go over 1,000 queries? $0.50 for your next 10,000 queries. I like that a lot. This is probably worth a good hard look.
IBM Cloudant: Sorry, I didn’t look at this one very hard at all, but I can tell you they have some kind of free “Lite” plan, though it seems like the exact details of that plan are behind some kind of login that I don’t want to create. (I’m going to admit that I did a bit of an eye roll when I landed on an IBM site in my searching. They’ve got a bit of an image problem to overcome with me, and I suspect that’s the same for many others as well.)
Azure Search: I didn’t look into this very much at all either. They do have a free tier, but I got stopped by the 50 MB storage limit, which seems bizarrely small compared to their competitors. “Standard rates apply” for data transfer, but you probably shouldn’t let that scare you as I think you get the first 5 GB/month free, which is probably more than enough for any small-to-medium static site.
Let me note that it seems like several hosted search offerings have gone away: FacetFlow (closed), Tapir (gone?), Firebase (docs tell you to use Algolia?). This might be something worth keeping in mind if you intend to invest time in learning and implementing one of these hosted solutions.
Host Server-Side Searching Yourself
I know this isn’t really in the spirit of static web sites, but if you need search, maybe you need to run some search software on your server.
I did not do a ton of research into this area. The names that immediately come to mind are ElasticSearch, Apache Solr, and Lucene. I know next to nothing about any of those except that I’m pretty sure ElasticSearch and Solr are both built on Lucene so you’d probably want to start there. Of course there’s always good old PostgreSQL full-text search with some (hopefully trivial) front end on it. I stumbled into bleve, who I think have some kind of search server you can run, but I’m not actually certain about that.
I’m sure there are many others.
I’m not terribly fond of the idea of sending my whole site index down to the client every time they load my page. It seems more than a bit rude, especially for people on metered Internet connections. Jakub Chodounský did, however, point out that his entire blog’s index is, “about 260kb [sic] before gzipping and 90kb after which is less than one nice image.” He also points out that the search is very fast, though I have to wonder if users on underpowered devices might disagree about that speed. Actually, go do a search on the SQLAlchemy documentation, which I believe is done on the client. It takes a surprisingly long time, and that’s on my MacBook Pro with Chrome, far from “underpowered” in any common use of the term.
Don’t Forget Indexing
You’ll need some kind of way to actually index your site. That’s going to depend on how your search works. If it’s Google CSE, for example, I think you index your site by waiting patiently. As I mentioned above, with Algolia I used a Hugo template to make a JSON file that I sent up to Algolia with atomic-algolia. Some software, like maybe Lunr or Solr, may come with its own tools to run through your content files.
There’s no single answer here, but I didn’t want you to forget, dear reader, of the need to actually get the content into whatever format your searching solution needs.