Sunday, March 20, 2016

When PostgreSQL Doesn't Scale Well Enough

The largest database I have ever worked on will eventually, it looks like, be moved off PostgreSQL.  The reason is that PostgreSQL doesn't scale well enough.  I am writing here however because the limitations are so extreme that it ought to give plenty of ammunition for those who think databases don't scale.

The current database size is 10TB and doubling every year.  The main portions of the application have no natural partition criteria.  The largest table currently is 5TB and the fastest growing portion of the application.

10TB is quite manageable.  20TB will still be manageable.  By 40TB we will need a bigger server.  But in 5 years we will be at 320 TB and so the future does not look very good for staying with PostgreSQL.

I looked at Postgres-XL and that would be useful if we had good partitioning criteria but that is not the case here.

But how many cases are there like this?  Not too many.

EDIT:  It seems I was misunderstood.  This is not complaining that PostgreSQL doesn't scale well  It is about a case that is outside of all reasonable limits.

Part of the reason for writing this is that I hear people complain that the RDBMS model breaks down at 1TB which is hogwash.  We are facing problems as we look towards 100TB.  Additionally I think that PostgreSQL would handle 100TB fine in many other cases, but not in ours.  PostgreSQL at 10, 20 or 50TB is quite usable even in cases where big tables have no adequate partitioning limit (needed to avoid running out of page counters), and at 100TB in most other cases I would expect it to be a great database system.  But the sorts of problems we will hit by 100TB will be compounded by the exponential growth of the data (figure within 8 years we expect to be at 1.3PB).  So the only solution really is to move to a big data platform.