This article gives an overview of LocustDB , a new and extremely fast open-source analytics database built in Rust. Part 1 gives some background on analytical query systems and my goals for LocustDB. Part 2 presents benchmark results produced on a data set of 1.46 billion taxi rides and demonstrates median speedups of 3.6x over ClickHouse and 2.1x over tuned kdb+. Part 3 is an architecture deep dive that follows two SQL statements through the guts of the query engine. Part 4 concludes with random learnings, thoughts on Rust, and other incoherent ramblings.
Online analytical processing
Simply put, online analytical processing (OLAP) involves interactively running queries on often large data sets to gain new insights. To give a concrete example, you might be logging all requests to your website and store fields like the country from which the page was accessed, the accessed url, http status code returned by the website, time it took to service the request, and any number of parameters included in the request. Then one day you get reports of performance issues on your search page and run a query like following to identify specific requests that lead to slow responses:
SELECT params.search_term, latency FROM requests WHERE url = '/search' AND latency > 10000 AND day = $TODAY ORDER BY latency DESC
Or you may want to do produce a report on the performance of your website, broken down by various attributes such as url and country. Or someone is trying to break into your systems, and you want to construct some filter that allows you to identify malicious requests and check whether they succeeded. Or one of your pages is returning errors and you want to filter down to and inspect the parameters of those requests.
Typical OLAP queries have a number of distinguishing features that impose unique design constraints:
- Individual queries can be very expensive and may require aggregating trillions of records or more.
- Data is often filtered on or grouped by multiple columns.
- Queries are created ad-hoc and are often unique with respect to others queries previously seen by the system.
As a corollary, elaborate indexing datastructures are often ineffective. Since the number of possible ways to filter or group a dataset grows exponentially with the number of columns, we can only provide precomputed indices/aggregates for a tiny subset of possible queries. The approach taken by LocustDB is to forgo any attempts at complex indexing structures in favor of simple partitioning, performing brute-force scans over large parts of the data set, and then making those scans as fast as hardware allows. The main techniques used to achieve this are caching data in memory, massive parallelism, various optimizations around columnar data storage  and general techniques for high performance computing . The trade-off is that simple queries will be relatively inefficient, which restricts throughput to maybe 1000s of queries per second. LocustDB is not the first system designed in this fashion. Notably Scuba , a proprietary system used at Facebook, and ClickHouse , an open source column-oriented DBMS developed at Yandex, have successfully taken the same approach. Scuba in particular adds two additional innovations which are not currently offered by any widely available systems.
Compression is a key tool for speeding up queries reading from disk and for reducing cost. A 2015 paper by Facebook  claims compression rates of 10x and larger, achieved through automatically applying multiple compression algorithms tailored to different kinds of data. Add to this the complete absence of indexing overhead, and it results in a very cost effective solution. I suspect there exist many low QPS HBase/Cassandra/Elasticsearch/… clusters that could be replaced by a system like Scuba with minimal cost increase (or even significant savings) while providing higher query speeds, more expressive query capabilities and eliminating the need to manually design indices and table schemas.
Most analytics systems have followed the SQL approach of requiring rigid table schemas that define the type and name of all columns upfront. While this makes a lot of sense for applications using a database for long-term storage of critical data, it is often less useful in an analytics setting where data may be short-lived and data integrity is not as crucial. I believe that foregoing rigid, predefined schemas unlocks a long tail of valuable use cases related to debugging, short lived experiments, prototyping and quick iteration for which setting up or migrating schemas is prohibitively cumbersome. A more subtle benefit is that supporting a flexible schema makes it much easier to support a multitude of compression schemes that may be difficult to retrofit if your code makes more rigid assumptions.
LocustDB originally started out during a Dropbox 2017 hackweek as a clone of Scuba. The prototype was open sourced and I have since continued working on it for fun and profit (minus the profit). So far most of my efforts have been focused on the query engine. It is still incomplete in terms of the breadth of supported query types and operators, but already extremely fast and fairly mature architecturally. Other components include a barely functional SQL(ish) parser, in-memory storage and simple compression schemes. There was support for basic persistence built on top of RocksDB at some point but most of this was killed during a refactor. Other functionality that you might consider important and that does not exist yet include the ability to replicate data and run queries across multiple machines, advanced compression algorithms, and not crashing randomly.