A Brief and Incomplete Catalog of Static Site Search Options
I was actually going to title this, “A Brief and Incomplete Catalog of Static Site Search Options, None of Which You Will Love”, but I thought that would be too long.
This is a list of options I’ve found for adding search functionality to this web site, which is to say a small, statically-generated site that is a hobby and doesn’t produce any revenue. I mention that bit about no revenue because if this was more than just a personal site, and particularly if it were a site that makes money for me, I’d have no problem plunking down $40/month or so to get some kind of hosted search, or even just hosting something like Apache Solr myself.
To reiterate, I’m looking for “full text search of my little web site”. A lot of these providers, such as Algolia, offer a lot more functionality than the “type a query, get some links back” that I’m looking for, so you are hereby warned about my bias in evaluating the offerings below.
Hosted search
These people host a search service for you. You find a way to shoot your content up to them, and then you shoot queries to them via JavaScript. Of the ones below, I think Google Cloud Search API, Para, and Algolia, are probably the stand out offerings, in that order. However, I’ve only really used Algolia and Google Custom Search Engine.
Algolia: Easy to use with a good API. Very fast. Unfortunately I hit the 10 KB limit on records almost immediately: any blog article of reasonable length can hit this easily. I would probably be willing to pay $2/month to get that limit lifted, but unfortunately their paid plans start at $35/month which is way too much for a hobby site. If you can live with this limit, though, Algolia makes search real easy. Algolia users can take a look at atomic-algolia for a tool to update your index.
Google Custom Search Engine (CSE): Pretty simple to implement Google-hosted search engine for your site. My #1 problem with Google CSE is not having a way to immediately update the index, as far as I can tell. My tied-for-#1 problem with Google CSE is that they’re going to make you show ads, unless you use their JavaScript API which is limited to one hundred requests per day and thus too low to be useful in my opinion.
AWS CloudSearch: No free tier, and some made-up numbers suggest $40/month if I wanted to pay them. Nope!
AWS ElasticSearch Service: From my perspective this would be similar to using Algolia. They have a free tier that I could probably fit into, but if I’m reading AWS documentation right, I think that only lasts for the first year? Because I love re-doing my search system every year instead of using that time to write new content. Their pricing after that free tier is something like $13.50/month at the low end, and so still out of my price range. Also, I was kind of scared off of ElasticSearch Service by articles such as Why You Shouldn’t Use AWS ElasticSearch Service. In fact, after reading for a bit longer I was kind of scared off of ElasticSearch period (for the purposes of adding search to a small static blog—not for, say, aggregating and searching application logs). Finally, I’m not sure if it is ever a good idea to expose ElasticSearch to the Internet, such as if you wanted clients to send search requests directly to ElasticSearch via JavaScript, so you might need some kind of front-end on it to use it for searching a web site. That would be more expense and more complexity.
Para: I’ve never heard of Para, but they look like some kind of hosted service backed by Lucene. They have a free plan that advertises one “application”, 1 GB storage, 1 read/s, 1 write/s. At face value that sounds like enough. They have a nice-looking article on setting up static site search using Para. Their API doesn’t look as simple as I was hoping, but I’m sure I could figure it out. Hey, look at this: Para looks like it is completely open source. Maybe that’s a good thing, because I have to admit that I’ve never heard of this service or its backing company, and so I’m naturally worried about implementing it on my site only to have them close shop the next day. This service may bear some further evaluation.
Google Cloud Engine Search API: What is this? I mean, I’m having a little trouble figuring out what to call this other than “Google Cloud Engine Search API”. Is that it? I need to look further into how this works, but Simo Ahava has written an article about adding search to a Hugo site with Google Search API, which is exactly what I want to do. They have a free tier with 250 GB storage (documents plus indexes), 1,000 queries/day, and 10 MB of “adding documents to indexes” per day. That sounds pretty passable, and if you think you might exceed those limits the incremental charges also sound reasonable. Go over 1,000 queries? $0.50 for your next 10,000 queries. I like that a lot. This is probably worth a good hard look.
IBM Cloudant: Sorry, I didn’t look at this one very hard at all, but I can tell you they have some kind of free “Lite” plan, though it seems like the exact details of that plan are behind some kind of login that I don’t want to create. (I’m going to admit that I did a bit of an eye roll when I landed on an IBM site in my searching. They’ve got a bit of an image problem to overcome with me, and I suspect that’s the same for many others as well.)
Azure Search: I didn’t look into this very much at all either. They do have a free tier, but I got stopped by the 50 MB storage limit, which seems bizarrely small compared to their competitors. “Standard rates apply” for data transfer, but you probably shouldn’t let that scare you as I think you get the first 5 GB/month free, which is probably more than enough for any small-to-medium static site.
Let me note that it seems like several hosted search offerings have gone away: FacetFlow (closed), Tapir (gone?), Firebase (docs tell you to use Algolia?). This might be something worth keeping in mind if you intend to invest time in learning and implementing one of these hosted solutions.
Host Server-Side Searching Yourself
I know this isn’t really in the spirit of static web sites, but if you need search, maybe you need to run some search software on your server.
I did not do a ton of research into this area. The names that immediately come to mind are ElasticSearch, Apache Solr, and Lucene. I know next to nothing about any of those except that I’m pretty sure ElasticSearch and Solr are both built on Lucene so you’d probably want to start there. Of course there’s always good old PostgreSQL full-text search with some (hopefully trivial) front end on it. I stumbled into bleve, who I think have some kind of search server you can run, but I’m not actually certain about that.
I’m sure there are many others.
Client-Side Searching
I’m not terribly fond of the idea of sending my whole site index down to the client every time they load my page. It seems more than a bit rude, especially for people on metered Internet connections. Jakub Chodounský did, however, point out that his entire blog’s index is, “about 260kb [sic] before gzipping and 90kb after which is less than one nice image.” He also points out that the search is very fast, though I have to wonder if users on underpowered devices might disagree about that speed. Actually, go do a search on the SQLAlchemy documentation, which I believe is done on the client. It takes a surprisingly long time, and that’s on my MacBook Pro with Chrome, far from “underpowered” in any common use of the term.
But if you’re OK with sticking clients with your whole search index and heating up their laps/hands while you do the search, then you have options. Lunr is a JavaScript project which is, “A bit like Solr, but much smaller and not as bright.” Hugo, my static site generator of choice as I write this, has a few links on using Lunr (and others) to add search to your static site. In fact, that page will give you a few other options as well, such as Fuse.js.
Don’t Forget Indexing
You’ll need some kind of way to actually index your site. That’s going to depend on how your search works. If it’s Google CSE, for example, I think you index your site by waiting patiently. As I mentioned above, with Algolia I used a Hugo template to make a JSON file that I sent up to Algolia with atomic-algolia. Some software, like maybe Lunr or Solr, may come with its own tools to run through your content files.
There’s no single answer here, but I didn’t want you to forget, dear reader, of the need to actually get the content into whatever format your searching solution needs.