Unconventional PostgreSQL Optimizations

hakibenita.com

258 points by haki 11 hours ago


zamalek - 2 hours ago

> The index is 214 MB! That's almost half the size of the entire table. So the analysts are happy, but you? Not so much...

This is part of a broader choice: write amplification. You'd want to, of course, have the most precise index possible - but no matter how you cut it, you are incurring extra I/O for writes - one for the tuple, one per index. How you index things is heavily influenced by the mix of reads and writes, and this is why we have data warehouses/read replicas in the first place: it allows us to avoid write amplification in the write path, while having fast filtered reads (that are slightly delayed).

If you're dealing with <ridiculous number of users>, there is a good chance that you don't want to be putting BI/OLAP indices on your OLTP database. You probably don't have enough users to worry about this - but - if you ever find that your writes are becoming an issue this is something to consider.

ComputerGuru - 13 minutes ago

The no unique constraint using hash index thing is something I always run into. Am I missing something or is it just glue code missing to translate that syntax to a exclusion constraint with a hash index, allowing foreign keys to reference the column in the process, and possibly also fixing the ON CONFLICT approach in one go?

sgarland - 13 minutes ago

Not discussed in TFA: BRIN indices. For the given date example, they would be perfect - if you can guarantee that whatever you’re indexing is monotonic, they’re a godsend. Tiny and performant.

msdrigg - 6 hours ago

The most interesting thing for me in this article was the mention of `MERGE` almost in passing at the end.

> I'm not a big fan of using the constraint names in SQL, so to overcome both limitations I'd use MERGE instead:

``` db=# MERGE INTO urls t USING (VALUES (1000004, 'https://hakibenita.com')) AS s(id, url) ON t.url = s.url WHEN MATCHED THEN UPDATE SET id = s.id WHEN NOT MATCHED THEN INSERT (id, url) VALUES (s.id, s.url); MERGE 1 ```

I use `insert ... on conflict do update ...` all the time to handle upserts, but it seems like merge may be more powerful and able to work in more scenarios. I hadn't heard of it before.

sc68cal - 8 hours ago

Great article, shows a lot of interesting PostgreSQL features. I have used PostgreSQL and MySQL for decades, and this article showed me that I have barely scratched the surface of what is possible.

danielheath - 3 hours ago

The hash technique for uniqueness isn’t supported for indexes because it doesn’t handle hash collisions. The authors proposed solution suffers the same problem- values which do not already exist in the table will sometimes be rejected because they have the same hash as something that was already saved.

pksunkara - 8 hours ago

I think a stored generated column allows you to create an index on it directly. Isn't it better approach?

OptionOfT - 9 hours ago

Remarkably fresh content.

It's interesting how both virtual columns and hash indexes work, but feel like they're bolted on, vs being made part of the whole ecosystem so that they work seamlessly.

themafia - 6 hours ago

I moved into the cloud a few years ago and so I don't get to play with fixed server infrastructure like pgsql as much anymore.

Is the syntax highlighting built into pgsql now or is that some other wrapper that provides that? (it looks really nice).

lasgawe - 7 hours ago

some points from this article that I didn't know before.

pphysch - 5 hours ago

Is the Hash Index method strictly superior to creating a unique "hash" column and precomputing the hash in the application or query?

SigmundA - 7 hours ago

>Currently, constraint exclusion is enabled by default only for cases that are often used to implement table partitioning via inheritance trees. Turning it on for all tables imposes extra planning overhead that is quite noticeable on simple queries, and most often will yield no benefit for simple queries.

PG's lack of plan caching strikes again, this sort of thing is not a concern in other DB's that reuse query plans.