Category Archives: LocustDB

The Simple Beauty of XOR Floating Point Compression

I recently implemented a small program to visualize the inner workings of a scheme that compresses floating point timeseries by XORing subsequent values. The resulting visualizations are quite neat and made it much easier for me to understand this beautiful algorithm than any of the explanations that I had previously encountered.

Hacker News (280 points, 68 comments)

The algorithm

The algorithm1This particular version of XOR floating point compression was first described in “Gorilla: A Fast, Scalable, In-Memory Time Series Database” and is often referred to as “Gorilla compression. is simple. We start by writing out the first floating point number in full. All subsequent numbers are XORed with the previous number and then encoded in one of three ways:

Continue reading

How to Read 100s of Millions of Records per Second from a Single Disk

This article gives an overview of the implementation and performance of two recent additions to LocustDB, an extremely fast open-source analytics database built in Rust. The first addition is support for persistent storage, the second is an lz4 compression pass for data stored on disk or cached in memory. Benchmarking LocustDB on a dataset of 1.46 billion taxi rides demonstrates excellent performance for cold queries, reaching > 95% of sequential read speed on SSD and > 70% of sequential read speed on HDD. Comparing to ClickHouse, queries with similar data volume are as fast or faster, and disk space usage is reduced by 40%.

Continue reading

How to Analyze Billions of Records per Second on a Single Desktop PC

Introduction

This article gives an overview of LocustDB [1], a new and extremely fast open-source analytics database built in Rust. Part 1 gives some background on analytical query systems and my goals for LocustDB. Part 2 presents benchmark results produced on a data set of 1.46 billion taxi rides and demonstrates median speedups of 3.6x over ClickHouse and 2.1x over tuned kdb+. Part 3 is an architecture deep dive that follows two SQL statements through the guts of the query engine. Part 4 concludes with random learnings, thoughts on Rust, and other incoherent ramblings. Continue reading