Building a relational database: File structure
For around a year I've thought about writing a toy database, but not been sure where to start. Recently a couple of things aligned and I dipped my toes into the water.
The database that I use most is PostgreSQL and Postgres is well documented, so I'm using it as a rough model. The number of potential features is huge and overwhelming, so I'm thinking in very small steps. However, the database will need at least these things:
- an SQL parser, which takes a string SQL statement and converts it to a tree
- a query planner, which takes the tree and works out a plan for fetching or modifying the data
- a storage engine to persist the data on disk
I've started with the storage engine because it took my interest and the first steps seemed surprisingly tractable.
Folders and files
Postgres running on 1 machine can have many databases. The databases are stored in 1 directory, conventionally called PGDATA. Each database has its own subdirectory and within a database's subdirectory, each table has its own file.
A table's file is carefully structured. The file is subdivided into 8kb sections, most often referred to as "pages". 8kb is just the norm for Postgres. Other databases have a different, fixed page size or let pages vary in size.
I don't understand the details yet, but page size relates to the performance of the underlying storage, e.g. your HDD or SSD. The main sticking point seems to be that the physical storage doesn't read 1 bit at a time. Instead, each device has a minimum amount of data that it can read and write. If your HDD reads in 4kb blocks, asking for 1 bit is wasteful because the HDD still has to read 4kb. At this point I don't understand what happens to the excess data, but presumably it gets discarded somewhere between the HDD read and returning the requested data to the user?
Anyway, to recap to the file format: a table's data is 1 file and each file has many 8kb chunks of data called pages. Each page has a specific structure, with Postgres using a format called a "slotted page". Each page has
- page header (metadata about the page)
- pointers to records
- free space
The slotted file format seems odd at first, but it has a big advantage: records can vary in size. If I have a table with a string column, 1 row might contain just 1 letter in that column and another row might contain 100s of letters. That would be awkward to store if the database record had a fixed width. With a slotted page, however, it's fine because each pointer stores the start location of a record, without making assumptions about record length.
The records (your table rows) start at the end of the 8kb page and grow towards the start of the page, eating into the free space. A new pointer is added to the start of the free space (the opposite end to the record) and stores the location of the new record. Over time, the records and pointers grow towards each other. That's convenient because it avoids the need to guess how many pointers might be needed or to keep moving records around to make room for new pointers, as would be necessary if pointers and records were stored next to each other.
Because a page is always 8kb, on creation a page might be mostly free space between the pointers and the records. Over time the free space fills, there isn't enough space to fit another record between the existing records and the pointers, so another 8kb page is added to the file.
That's pretty much where I am. The more I read, the more I realise that database design involves a mind boggling number of small details, some of which matter a great deal to performance. I'm defaulting to "what Postgres does" and have so far avoided decision paralysis. And for now I'm not worrying about performance -- I'm just taking tiny steps towards getting something working.
Roughly how I broke down the work
This isn't a code-along "Build your own database" kind of post. But I thought I'd include some first steps for the benefit of people who, like me, are curious to try but don't know where to start.
As described in the post, the basic structure of the database will be a series of directories and files, so I wrote functions to:
- create nested directories (1 directory per database)
- create a new file (1 file per table)
- append exactly 8kb to a file (writing a page) (and at this point I'm just writing 8kb of the number 1)
- read any specified 8kb from a file (a page read)
Each step is very small, but together they're a good start.
wc -chas been useful for confirming the number of bytes in a file
xxddoes a nicely formatted dump of the file, useful for confirming the contents
- Hironobu Suzuki's writeup The Internals of PostgreSQL is a fantastic resource. It's currently my goto when I'm unsure how something fits together.
- The Postgres docs Chapter 52: Overview of PostgreSQL Internals and Chapter 73: Database Physical Storage have been a great help. They focus on implementation details more than database concepts, so I found them more useful after I'd done other reading.
- For a conceptual overview, I've watched some of the videos for UC Berkeley's CS186 Introduction to Databases. Lectures 4 and 5 cover file formats, including alternatives to slotted pages.
- Hussein Nasser's Fundamentals of Database Engineering course on Udemy kick started this whole thing. I'll confess that not all the videos are to my taste -- I like the structured ones where Hussein spends most of the time at a Postgres command line. But there's 25 hours of remarkably accessible info about databases and it inspired me to finally start this project.