Pipelining in psql (PostgreSQL 18)

postgresql.verite.pro

151 points by tanelpoder 17 hours ago


vlovich123 - 11 hours ago

I’m pretty sure the reasoning and conclusion is way off on explaining the speed up:

> The network is better utilized because successive queries can be grouped in the same network packets, resulting in less packets overall.

> the network packets are like 50 seater buses that ride with only one passenger.

The performance improvement is not likely to be because you’re sending larger packets, since most queries transfer very little data and the benchmark the conclusion is drawn from definitely is transferring near 0 data. The speed up comes from removing waiting on a round trip ack of a batch from executing subsequent queries; the number of network packets is irrelevant.

WilcoKruijer - 14 hours ago

I feel pipelines (or batches) are slept upon. So many applications use interactive transactions to ‘batch’ multiple queries, waiting for the result of each individual query. Network roundtrip is the biggest contributor to latency in most applications, and this makes it so much worse. Most Postgres drivers don’t even support batching, at least in the JavaScript world.

In many cases it would be good to forego interactive transactions and instead execute all read-only queries at once, and another write batch after doing processing on the obtained data. That way, the amount of roundtrips is bounded. There are some complications of course, like dealing with concurrency becomes more complicated. I’m currently prototyping a library exploring these ideas.

tlocke - 13 hours ago

I must confess, the Python driver pg8000 which I maintain doesn't support pipeline mode. I didn't realise it existed until now, and nobody has ever asked for it. I've created an issue for it https://codeberg.org/tlocke/pg8000/issues/174

theThree - 12 hours ago

I developed a JS pg client that use pipeline mode by default: https://github.com/stanNthe5/pgline

stephen - 8 hours ago

I really want to use pipelining for our "em.flush" of sending all INSERTs & UPDATEs to the db as part of a transaction, b/c my initial prototyping showed a 3-6x increase:

https://joist-orm.io/blog/initial-pipelining-benchmark/

If you're not in a transaction, afaiu pipelining is not as applicable/useful b/c any SQL statement failing in the pipeline fails all other queries after it, and imo it would suck for separate/unrelated web requests that "share a pipeline" to have one request fail the others -- but for a single txn/single request, these semantics are what you expect anyway.

Unfortunately in the TypeScript ecosystem, the node-pg package/driver doesn't support pipelining yet, instead this "didn't quite hit mainstream adoption and now the author is AWOL" driver does: https://github.com/porsager/postgres

I've got a branch to convert our TypeScript ORM to postgres.js solely for this "send all our INSERTs/UPDATEs/DELETEs in parallel" perf benefit, and have some great stats so far:

https://github.com/joist-orm/joist-orm/pull/1373#issuecommen...

But it's not "must have" for us atm, so haven't gotten time to rebase/ship/etc...hoping to rebase & land the PR by eoy...

codesnik - 13 hours ago

activerecord in Rails has async mode, which allows you to queue several requests and read results later. But those will go through the connection pool, and will be executed in separate connections, separate transactions, and separate PostgreSQL server processes. I wonder if using pipelining instead, on a driver level (app code would be the same), would be a better approach in general, or at least easier on db instance.

chris_pie - 9 hours ago

I wish the author explained the difference between pipelines and multi-statement queries

victorbjorklund - 10 hours ago

Sad I can't use this in Elixir. Looks pretty sweet.

waynesonfire - 4 hours ago

I havn't had to deal with this problem until recently and seems like an obvious scalablity issue so I'm sure I'm not the only one to have hit this.

How do I handle, say 100K concurrent transactions in an OLTP database? Here are my learnings that make this difficult,

- a transaction has a one-to-one mapping with a connection

- a connection can only process one transaction at at time, so pooling isn't going to help.

- database connections are "expensive"

- a client can open at maximum, 65k connections as otherwise it would run out of ports.

A 100k connections isn't that crazy; say you have 100k concurrent users and each one needs a transaction to manage it's independent state. Transactions are useful as they enforce consistency.