Lake

Tomorrow’s data lakes’ capacity expanded in the PostgreSQL-managed topology. Source: Shutterstock

PostgreSQL evolves to fill tomorrow’s data oceans

THE CHOICE of management systems for databases tends to be an issue on which many administrators’ opinions divide. The range of options reflects the disparity in choices, from the free MySQL to the heftily-priced SAP HANA, and much in between.

One highly regarded relational database management system (RDMS), PostgreSQL, has been limited in number versions to date by a limit on its maximum table size, which, according to the documentation is 32TB.

During the system’s creation back in the late 1980s, 32 terabytes of data (that’s three Libraries of Congresses-worth) probably seemed ample; indeed, an excessive allowance for database repositories to grow into.

But readers of a certain age who remember the Y2K “bug” will know that what was once considered a far-off event – be that the turn of the millennium or a single table able to contain what’s now just a few desktop PCs’ data – comes round soon enough.

As our world becomes increasingly digitized – the Internet of Things being a burgeoning case in point – the requirement to store amounts of data previously unimagined becomes more pressing, and any historical limits become problematic constraints.

PostgreSQL’s versions since 8.1’s arrival in 2005 contained a feature called Table Inheritance, which was an attempt to implement Table Partitioning (familiar to many other RDMSs’ users). Partitioning allows database files to be split into multiple parts to improve performance and security, and increase overall datasets’ sizes. Table Inheritance in PostgreSQL evolved into a successful implementation of Declarative Partitioning in PostgreSQL 10.

However, a bug (which has existed for decades) recently discovered meant that the number of subtables permitted was limited to a 16-bit value, despite the store for that counter is a 32-bit field. After its fix, maximum table size becomes 2 exabytes or 2048 petabytes.

With the progression to PostgreSQL 11, the full complement of 2^32 subtables can be used, allowing a total table size of 131 yottabytes.

Manipulation of huge tables is, of course, only one aspect of an RDMS. PostgreSQL’s myriad of fans point to its robust structure and structure, while its detractors quote its sometimes troublesome syntax and patchy documentation.

One criticism will not, however, be, any constraint PostgreSQL imposes on the size of the data lakes on which we increasingly base our lives.