In the previous parts of this guide, we wrote assembly code for a simple boot sector and set up the GNU toolchain we use for compiling it and writing it to a floppy disk image. We then used the Bochs IA-32 emulator to boot from that image and see it run.
Our current boot loader resets the drive system, writes a “loading” messages to the screen, waits for a keypress and reboots. It’s now time to make it do what it’s supposed to do: find our kernel file on the disk, so it can be loaded into memory and run.
This article is part of a series on toy operating system development.
Reading from disk
Before we get into how to find a file, we need to get some basics sorted first: we need some way to actually read data from our disk. Reading from a disk is done through calling BIOS interrupts (built-in functions that come with the BIOS of any computer). We’ll require interrupt 0x13, subfunction 2. Its arguments are these:
|Number of sectors to read|
|Low 8 bits of cylinder|
|High 2 bits of cylinder, 6 bits for sector|
The interrupt tries to read one or more sectors from disk, and sets the carry flag if it fails. Failures are common, and read operations should always be tried several times before giving up.
Logical Block Addressing (LBA)
Did you notice the track and head parameters? So far we’ve been talking about sectors only. In facts, we’ve assumed that a disk is simply a long sequential list of sectors. That would be nice, but reality is not nice in this case. We’ve been thinking in terms of what is known as Logical Block Addressing (LBA), while actual disks work differently. Disks can actually have multiple sides that are read by multiple reading heads. Also, sectors are grouped in tracks or cylinders (like on a vinyl record) which further complicates matters. However, the math required to convert an LBA number to a head, cylinder and sector combination is this:
What we need to do, then, is write a function that converts an LBA number into these low-level values, reads a sector, and gives it back. That way, the rest of our code can focus on thinking in terms of logical block addresses which avoids headaches.
Read sector code
Here is the code that reads a logical sector from disk:
This code will attempt to read the sector with LBA specified in
AX up to four times before giving up and jumping to bootFailure. After each failed read attempt, the disk system is reset. Most of the code focuses on doing the divisions required to get the values needed by the interrupt call. If all goes well, the sector’s 512 bytes are placed in
As a side note, we’re actually lucky to have interrupt 0x13 at this stage. Once we’ve put our processor in protected mode, we won’t have access to the BIOS anymore (all its code only runs in 16-bit real mode) and we’ll have to access the disk drive ourselves which is very tricky.
Introduction to FAT
There are many file systems in use these days (NTFS, ext, BFS, UFS, etc.) and we need to pick one to use for our bootable disk. FAT (“File Allocation Table”) is not the most advanced system available, but it’s simple, and on Windows we have the tools to format a disk with it. Plus, rawrite and ImageFS understand it. At a later stage in the development of a toy OS, you can implement support for bags of other file systems if you want.
There is no actual requirement to use any file system. A (floppy) disk is just a bag of bytes, grouped in sectors, clusters and tracks (we’ll get to that), that you can read from and write to. The BIOS expects the first sector of a bootable to disk to be a boot sector, but that’s the only requirement. However, at some point we’ll want to store files and it’s a good idea to pick a file system now, so FAT it is.
FAT comes in different flavors. There’s FAT12, FAT16 and FAT32. The difference is in the size of the disk supported. For FAT12, references to sectors on disk can be up to 12 bits (a maximum value of 4,096 sectors), while with FAT32 we can point to 4,294,967,296 sectors. Since we’re working with floppy disks, FAT12 is enough. The discussion below applies to all FAT versions.
The basic structure of a 720 KB floppy disk formatted with FAT is this:
|File allocation table (FAT)||18|
A 720 KB floppy disk contains 1440 512-byte sectors, some of which are reserved as shown. The remaining 1,407 sectors contain the data of our files.
The boot sector
Here is a quick refresher of the layout of the boot sector (see also the first part of this guide).
|0000||3||Code||Jump to rest of code|
|0011||2||Bytes per sector||512|
|0013||1||Number of sectors per cluster||1|
|0014||2||Number of reserved sectors||1|
|0016||1||Number of FAT tables||2|
|0017||2||Number of root directory entries (usually 224)||224|
|0019||2||Total number of sectors||2880|
|0022||2||Number of sectors per FAT||9|
|0024||2||Number of sectors/track||9|
|0026||2||Number of heads||2|
|0028||2||Number of hidden sectors||0|
|0030||2||EBPB||Number of hidden sectors (high word)||0|
|0032||4||Total number of sectors in filesystem|
|0036||1||Logical drive number||0|
I’ve filled this table with the values you would typically get for a 720 KB floppy disk formatted with FAT. The values in the boot sector are used to determine where exactly on the disk the FAT tables and the root directory live.
The root directory
In order to find a file on the disk, we need to start by looking at the root directory. This is a list of up to 224 entries (according to the boot sector) of 32 bytes each. Each entry contains data about a file:
|0x12||2||Last access date|
|0x16||2||Last modified time|
|0x18||2||Last modified date|
|0x1c||4||File size (bytes)|
So the root directory is a list of files on disk. In order to find our kernel file, all we need to do is read it and look for the kernel filename (“KERNEL.BIN” would be good). The directory entry then tells us in which cluster the file resides.
Finding a file on disk
The original boot sector code for DOS and Windows 95 required that the OS kernel file (IO.SYS) be the first file in the root directory of the boot disk. This allowed the programmer to write less code: he did not have to scan the disk’s root directory looking for the file but could simply assume that the file would reside at a fixed position. We can do better, though, by actually scanning the root directory to find our file – and you will see that it’s still possible to squeeze all required code in the 448 bytes that we’re allowed to use.
In order to find a file, we need to take the following steps:
- Prepare a memory area to load sectors of the root directory into (one sector at a time will do)
- Calculate the size of the root directory in sectors
- Figure out where the root directory starts on disk (see above)
- For each sector:
- Read the sector into memory
- Scan it to see if it contains the filename we’re looking for
- If found, calculate its starting sector and file size
- If not found, the boot process fails
Root directory size
The size of the root directory in sectors is the number of entries it contains, times 32 bytes:
Root directory start
To get started, we’ll first need to know where the root directory begins. The information we need is in the boot sector, where the following values are important:
- Number of FAT tables (2)
- Number of sectors per FAT (9)
- Number of reserved sectors (1)
- Number of hidden sectors (0)
The root directory comes after and reserved or hidden sectors and the file allocation table (FAT). The starting cluster of the root directory is therefore:
The one reserved sector is of, course, the boot sector. There are no hidden sectors on our disk.
Reading and scanning each sector
We can now use the ReadSector function we developed earlier to read the sectors into memory, one by one. For each sector, we then loop through the entries and check each filename (the first 11 bytes of the directory entry) for a match:
Getting the file start position
Now that we have found the root directory entry for the kernel file, we need to get from it the position of the file (the cluster number):
The end result is that
file_strt contains the file’s starting cluster number.
All right, after all this digging through the disk’s root directory, we’ve found our file and we’ve got a cluster number in hand. However, so far we’ve only talked about sectors, not clusters (you can safely ignore the cylinder, track and head story from here on – they have nothing to do with it). A cluster is another logical structure introduced by FAT.
The idea is that in FAT, a file may occupy a number of sectors. And most files certainly will, because one sector is only 512 bytes. Files on disk will often be much larger than that! So imagine that we have a disk with two of files on it. They’re both 3 sectors in size, and they were physically written to the disk so as to occupy precisely the first 6 sectors of the data area. Now, the first file is deleted. That means we now have 3 empty sectors at the start of the data area. Next, a new file, 4 sectors in size, is copied onto the disk. Where does it go? There’s no space to put new file before the remaining existing file, since there’s only space for 3 sectors. We can only place the new file after the existing file. Maybe at some point a user will want to write a 3-sector file that can fill the gap!
Naturally, if you consider writing and deleting lots of files on a disk, this will leave the file system fragmented and with lots of gaps that we can’t use. And this is where FAT comes in. With FAT, a file that occupies multiple sectors on disk may be broken up into non-contiguous sectors. So a 3-sector file could occupy physical sectors 40, 21 and 304. The bits and pieces can go wherever there is space and all available space is used to the fullest.
The problem then is administration. How does FAT know where the loose pieces of each file are? For this, the FAT system uses its namesake, the File Allocation Table. This table lives right before the root directory and contains an index of all the sectors on the disk. There are actually two copies of the FAT on every disk: one is a backup, for safety. You’ll see that the bootsector actually contains a field with the number of FAT tables on the disk.
The FAT table is actually an index of linked lists of sectors (hello?). An example illustrates how this works. Imagine we have a disk with 10 data sectors on it (we’re ignoring the sectors occupied by boot sector, root directory, FAT etc., and the fact that 10 sectors is a really small disk):
There are actually three files on this disk. From the root directory we get:
- File 1 – cluster index = 0
- File 2 – cluster index = 2
- File 3 – cluster index = 3
The directory entry for file 1 points to cluster index 0. In the FAT table, the value for this index is
0x001: it points to index 1. Index 1’s value is
0x004: it points to index 4. Index 4’s value is
0x006: is points to index 6. Index 6’s value, finally, is
0xff8, a special value which means that it is the last index of the file.
What this all means is that file 1 occupies sectors 0, 1, 4 and 6. Actually, sector 0 is the sector after the root directory on the disk, but that’s a detail.
For file 2 we can now determine that it occupies only one sector (sector 2). The FAT-value for this index is 0xff8, so there are no more sectors.
For file 3, we see that it occupies sectors 3 and 7. You’ll get the idea by now.
- Some of the FAT indices have the value
0. That means that the sector is free.
- Any value of
0xff8or higher means end-of-file
- A value of
0xff7means the sector is bad and should not be used.
A cluster, therefore, is a linked list where each entry points to the next, until the end of the cluster. Each entry relates to an actual physical sector on disk. When reading a file, we’ll have to follow its FAT cluster and read each corresponding sector into memory.
It sounds simple enough, after all, but there’s a catch: the FAT entries occupy 12 bits each (a byte and a half), which will make reading and using the data bit tricky later. You will now see that since each the FAT consists of 9 sectors, it can address 9*512/1.5 = 3072 sectors, which is more than enough, since our disk has 1440 sectors.
Reading the FAT
The FAT lives on the disk, and in order to access it, we’ll have to get it in memory first. The following code does just that: it loads the FAT into a memory segment (I use segment 0x0ee0, so as to place it just under where I’ll be loading the next file):
After this code has executed, all 9 sectors of the FAT now live in memory at
Reading a file
Without FAT in memory, we are not ready to read a file from disk. By scanning the root directory we had earlier determined the starting index in the FAT table, so we’re good to go:
What happens here is that we start with
cx = first sector of file (we know this from the root directory after all). We read that sector into memory, then use the
cx value to access the corresponding index in the FAT. There’s a bit of trickiness where we juggle with the 12-bits issue, but then all that remains is whether to continue reading, or stop because end-of-file (FAT-entry
0xff8 or higher) was reached.
From a bare-bones boot loader, we’ve now come to a point where we can load the kernel (or rather the second-stage boot loader – this will become clear later on)! We’ve seen how the root directory and FAT work, and neatly read both to load the required file from disk. Since we squeezed all this functionality in, the file we’re loading can actually live anywhere in the root of the disk. Gone is the requirement for SYS.COM that DOS and Windows imposed. In our future OS, making a disk bootable should not require a formatting operation. Writing a boot sector and simply coping the OS files onto the disk will suffice.
So far I’ve shown here some sections of code that still need to be sewn together. In the next part of this guide, we’ll add a small kernel (something along the lines of “hello world”) to be loaded and executed. We’ll then have a complete source that can be compiled and run on Bochs (or from a floppy disk if you’re hardcore).