Fondo | ParadeDB Launches: A New Storage Mechanism for Search and Analytics in Postgres

ParadeDB recently launched!

‍

^"*^ParadeDB^{is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads."}***

^‍

Founded by Philippe Noël‍

‍

A New Postgres Block Storage Layout for Full Text Search

‍

ParadeDB recently completed one of their biggest engineering bets to date: migrating pg_search, a Postgres extension for full text search and analytics, to Postgres' block storage system. In doing so, pg_search is the first-ever extension1 to port an external file format to Postgres block storage.

For context - block storage is Postgres’ storage API that backs all of Postgres’ tables and built-in index types. Prior to this migration, pg_search operated outside of block storage. This means that the extension created files which were not managed by Postgres and could read the contents of those files directly from disk. While it’s not uncommon2 for Postgres extensions to do this, block storage has enabled pg_search to simultaneously achieve:

Postgres write-ahead log (WAL) integration, which is necessary for physical replication of the index
Crash and point-in-time recovery
Full support for Postgres MVCC (multi-version concurrency control)
Integration with Postgres’ buffer cache, which has led to massive improvements in index creation times and write throughput

‍

At first, they weren’t sure if reconciling the data access patterns and concurrency model of Postgres and Tantivy — pg_search's underlying search library — was possible without drastic changes to Tantivy3. In this blog post, we’ll briefly dive into how they architected pg_search's new block storage layout and data access patterns.

In the near future, ParadeDB will release two more posts: one to discuss how they designed and tested pg_search to be MVCC-safe in update-heavy scenarios, and another to dive into how they customized the block storage layout for analytical workloads (e.g. faceted search, aggregates).

‍

What Is Block Storage?

The fundamental unit of block storage is a block: a chunk of 8192 bytes. When executing a query, Postgres reads blocks into buffers, which are stored in Postgres’ buffer cache.

DML (INSERT, UPDATE, DELETE, COPY) statements do not modify the physical block. Instead, their changes are written to the underlying buffers, which are later flushed to disk when evicted from the buffer cache or during a checkpoint.

If Postgres crashes, modifications to buffers that have not been flushed can become lost. To guard against this, any changes to the index must be written to the write-ahead log (WAL). During crash recovery, Postgres replays the WAL to restore the database to its most recent state.

‍

What Is pg_search?

pg_search is a Postgres extension that implements a custom index for full text search and analytics. The extension is powered by Tantivy, a search library written in Rust and inspired by Lucene.

‍

Why Migrate To Block Storage?

A custom Postgres index has two choices for persistence: use Postgres block storage or the filesystem. At first, using the filesystem may seem like the easier option. Integrating with block storage requires solving a series of problems:

Some data structures may not fit within a single 8KB block. Splitting data across multiple blocks can create lock contention, garbage collection, and concurrency challenges.
Once a block has been allocated to a Postgres index, it cannot be physically deleted — only recycled. This means that the size of an index strictly increases until a VACUUM FULL or REINDEX is run. The index must be careful to return blocks that have been tombstoned by deletes or vacuums to Postgres’ free space map for reuse.
In update-heavy scenarios, the index can become dominated by space that once belonged to dead (i.e. deleted) rows. This may increase the number of I/O operations required for searches and updates, which degrades performance. The index must find ways to reorganize and compact the index during vacuums.
Because Postgres is single-threaded, multiple threads cannot concurrently read from block storage (this will be covered in detail in Parts 2 and 3 of ParadeDBs blog post). The index may need to leverage Postgres’ parallel workers.

Once the index overcomes these hurdles, however, Postgres block storage does an incredible amount of heavy lifting. After a year of working with the filesystem, it became clear that block storage was the way forward.

Being able to use the buffer cache means a huge reduction in disk I/O and massive improvements to read and write throughput.
Postgres provides a simple, battle-tested API to write buffers to the WAL. Without block storage, extensions must define custom WAL record types and implement their own WAL replay logic, which drastically increases complexity and surface area for bugs.
Postgres handles the physical creation and cleanup of files for us. The index doesn’t need to clean up after aborted transactions or DROP INDEX statements.

‍

The complete blog post by ParadeDB can be read here. In the following parts of the blog post series, they will dive into more exciting challenges faced with block storage, with a focus on concurrency and analytical performance.

‍

Learn More

^‍

^{🌐 Visit}^{www.paradedb.com}^{to learn more.}

^‍

*^{⭐ Check out the}^{Github repo}^{for more details or join the}^{ParadeDB community}^{with questions. And please don’t hesitate to}^{give them a star}***^!

^‍

^‍^‍*^{👣 Follow ParadeDB on}^LinkedIn***^&^X^.

‍

Posted

March 14, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

← Back to all posts

ParadeDB Launches: A New Storage Mechanism for Search and Analytics in Postgres

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

"ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads."

‍

A New Postgres Block Storage Layout for Full Text Search

What Is Block Storage?

What Is pg_search?

Why Migrate To Block Storage?

Learn More

‍

🌐 Visit www.paradedb.com to learn more.

‍

⭐ Check out the Github repo for more details or join the ParadeDB community with questions. And please don’t hesitate to give them a star!

‍

‍‍👣 Follow ParadeDB on LinkedIn & X.

Featured

a0.dev Launches: Phase 1 - We're All Gonna Make Apps (WAGMA)

Avocado 🥑 Launches: Restaurant POS with a Built-in AI Agent

Hemut Launches: AI-Powered Truck Management Software

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

ParadeDB Launches: A New Storage Mechanism for Search and Analytics in Postgres

"ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads."

‍

A New Postgres Block Storage Layout for Full Text Search

What Is Block Storage?

What Is pg_search?

Why Migrate To Block Storage?

Learn More

‍

🌐 Visit www.paradedb.com to learn more.

‍

⭐ Check out the Github repo for more details or join the ParadeDB community with questions. And please don’t hesitate to give them a star!

‍

‍‍👣 Follow ParadeDB on LinkedIn & X.

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

a0.dev Launches: Phase 1 - We're All Gonna Make Apps (WAGMA)

Avocado 🥑 Launches: Restaurant POS with a Built-in AI Agent

Hemut Launches: AI-Powered Truck Management Software

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

^"*^ParadeDB^{is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads."}***

^‍

^‍

^{🌐 Visit}^{www.paradedb.com}^{to learn more.}

^‍

*^{⭐ Check out the}^{Github repo}^{for more details or join the}^{ParadeDB community}^{with questions. And please don’t hesitate to}^{give them a star}***^!

^‍

^‍^‍*^{👣 Follow ParadeDB on}^LinkedIn***^&^X^.

^"*^ParadeDB^{is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads."}***

^‍

^‍

^{🌐 Visit}^{www.paradedb.com}^{to learn more.}

^‍

*^{⭐ Check out the}^{Github repo}^{for more details or join the}^{ParadeDB community}^{with questions. And please don’t hesitate to}^{give them a star}***^!

^‍

^‍^‍*^{👣 Follow ParadeDB on}^LinkedIn***^&^X^.

Join Our Newsletter and Get the Latest
Posts to Your Inbox