Operating System Development: FIle Allocation Table (FAT) and Reading from disk

In the previous parts of this guide, we wrote assembly code for a simple boot sector and set up the GNU toolchain we use for compiling it and writing it to a floppy disk image. We then used the Bochs IA-32 emulator to boot from that image and see it run. Our current boot loader […]

This article was posted by Independent Software, a website and database application development company based in Maputo, Mozambique. Our website offers regular write-ups on technical and design issues, ranging from details at code level to 3D Studio Max rendering. Read more about Independent Software's philosophy, or get in touch with Independent Software.

In the previous parts of this guide, we wrote assembly code for a simple boot sector and set up the GNU toolchain we use for compiling it and writing it to a floppy disk image. We then used the Bochs IA-32 emulator to boot from that image and see it run.

Our current boot loader resets the drive system, writes a “loading” messages to the screen, waits for a keypress and reboots. It’s now time to make it do what it’s supposed to do: find our kernel file on the disk, so it can be loaded into memory and run.

This article is part of a series about toy operating system development.

Reading from disk

Before we get into how to find a file, we need to get some basics sorted first: we need some way to actually read data from our disk. Reading from a disk is done through calling BIOS interrupts (built-in functions that come with the BIOS of any computer). We’ll require interrupt 0x13, subfunction 2. Its arguments are these:

Interrupt 0x13 – 2: Read sectors

RegisterValue
alNumber of sectors to read
ahSubfunction 2
chLow 8 bits of cylinder
clHigh 2 bits of cylinder, 6 bits for sector
dhHead number
dlDrive number
es:bxData buffer

The interrupt tries to read one or more sectors from disk, and sets the carry flag if it fails. Failures are common, and read operations should always be tried several times before giving up.

Logical Block Addressing (LBA)

Did you notice the track and head parameters? So far we’ve been talking about sectors only. In facts, we’ve assumed that a disk is simply a long sequential list of sectors. That would be nice, but reality is not nice in this case. We’ve been thinking in terms of what is known as Logical Block Addressing (LBA), while actual disks work differently. Disks can actually have multiple sides that are read by multiple reading heads. Also, sectors are grouped in tracks or cylinders (like on a vinyl record) which further complicates matters. However, the math required to convert an LBA number to a head, cylinder and sector combination is this:

What we need to do, then, is write a function that converts an LBA number into these low-level values, reads a sector, and gives it back. That way, the rest of our code can focus on thinking in terms of logical block addresses which avoids headaches.

Read sector code

Here is the code that reads a logical sector from disk:

This code will attempt to read the sector with LBA specified in AX up to four times before giving up and jumping to bootFailure. After each failed read attempt, the disk system is reset. Most of the code focuses on doing the divisions required to get the values needed by the interrupt call. If all goes well, the sector’s 512 bytes are placed in ES:BX.

As a side note, we’re actually lucky to have interrupt 0x13 at this stage. Once we’ve put our processor in protected mode, we won’t have access to the BIOS anymore (all its code only runs in 16-bit real mode) and we’ll have to access the disk drive ourselves which is very tricky.

Introduction to FAT

There are many file systems in use these days (NTFS, ext, BFS, UFS, etc.) and we need to pick one to use for our bootable disk. FAT (“File Allocation Table”) is not the most advanced system available, but it’s simple, and on Windows we have the tools for format a disk with it. Plus, rawrite and ImageFS understand it. At a later stage in the development of a toy OS, you can implement support for bags of other file systems if you want.

There is no actual requirement to use any file system. A (floppy) disk is just a bag of bytes, grouped in sectors, clusters and tracks (we’ll get to that), that you can read from and write to. The BIOS expects the first sector of a bootable to disk to be a boot sector, but that’s the only requirement. However, at some point we’ll want to store files and it’s a good idea to pick a file system now, so FAT it is.

FAT comes in different flavors. There’s FAT12, FAT16 and FAT32. The difference is in the size of the disk supported. For FAT12, references to sectors on disk can be up to 12 bits (a maximum value of 4,096 sectors), while with FAT32 we can point to 4,294,967,296 sectors. Since we’re working with floppy disks, FAT12 is enough. The discussion below applies to all FAT versions.

The basic structure of a 720 KB floppy disk formatted with FAT is this:

RegionSize
Boot sector1
File allocation table (FAT)18
Root directory14
Data area1407

A 720 KB floppy disk contains 1440 512-byte sectors, some of which are reserved as shown. The remaining 1,407 sectors contain the data of our files.

The boot sector

Here is a quick refresher of the layout of the boot sector (see also part 1 of this guide).

OffsetSize ContentsTypical value
00003CodeJump to rest of code
00038BPBOEM nameGreat-OS
00112Bytes per sector512
00131Number of sectors per cluster1
00142Number of reserved sectors1
00161Number of FAT tables2
00172Number of root directory entries (usually 224)224
00192Total number of sectors2880
00211Media descriptor0xf0
00222Number of sectors per FAT9
00242Number of sectors/track9
00262Number of heads2
00282Number of hidden sectors0
00302EBPBNumber of hidden sectors (high word)0
00324Total number of sectors in filesystem
00361Logical drive number0
00371Reserved
00381Extended signature0x29
00394Serial number
004311Volume labelMYVOLUME
00548Filesystem typeFAT16
0062448CodeBoot code
05102RequiredBoot signature0xaa55

I’ve filled this table with the values you would typically get for a 720 KB floppy disk formatted with FAT. The values in the boot sector are used to determine where exactly on the disk the FAT tables and the root directory live.

The root directory

In order to find a file on the disk, we need to start by looking at the root directory. This is a list of up to 224 entries (according to the boot sector) of 32 bytes each. Each entry contains data about a file:

OffsetSizeDescription
0x008File name
0x083File extension
0x0b1Attribute
0x0c1Reserved
0x0d1Creation timestamp
0x0e2Creation time
0x102Creation date
0x122Last access date
0x142Reserved
0x162Last modified time
0x182Last modified date
0x1a2Cluster
0x1c4File size (bytes)

So the root directory is a list of files on disk. In order to find our kernel file, all we need to do is read it and look for the kernel filename (“KERNEL.BIN” would be good). The directory entry then tells us in which cluster the file resides.

Finding a file on disk

The original boot sector code for DOS and Windows 95 required that the OS kernel file (IO.SYS) be the first file in the root directory of the boot disk. This allowed the programmer to write less code: he did not have to scan the disk’s root directory looking for the file but could simply assume that the file would reside at a fixed position. We can do better, though, by actually scanning the root directory to find our file – and you will see that it’s still possible to squeeze all required code in the 448 bytes that we’re allowed to use.

  • In order to find a file, we need to take the following steps:
  • Prepare a memory area to load sectors of the root directory into (one sector at a time will do)
  • Calculate the size of the root directory in sectors
  • Figure out where the root directory starts on disk (see above)
  • For each sector:
    • Read the sector into memory
    • Scan it to see if it contains the filename we’re looking for
    • If found, calculate its starting sector and file size
  • If not found, the boot process fails

Root directory size

The size of the root directory in sectors is the number of entries it contains, times 32 bytes:

Root directory start

To get started, we’ll first need to know where the root directory begins. The information we need is in the boot sector, where the following values are important:

  • Number of FAT tables (2)
  • Number of sectors per FAT (9)
  • Number of reserved sectors (1)
  • Number of hidden sectors (0)

The root directory comes after and reserved or hidden sectors and the file allocation table (FAT). The starting cluster of the root directory is therefore:

The one reserved sector is of, course, the boot sector. There are no hidden sectors on our disk.

Reading and scanning each sector

We can now use the ReadSector function we developed earlier to read the sectors into memory, one by one. For each sector, we then loop through the entries and check each filename (the first 11 bytes of the directory entry) for a match:

Getting the file start position

Now that we have found the root directory entry for the kernel file, we need to get from it the position of the file (the cluster number) :

The end result is that file_strt contains the file’s starting cluster number.

FAT Clusters

All right, after all this digging through the disk’s root directory, we’ve found our file and we’ve got a cluster number in hand. However, so far we’ve only talked about sectors, not clusters (you can safely ignore the cylinder, track and head story from here on – they have nothing to do with it). A cluster is another logical structure introduced by FAT.

The idea is that in FAT, a file may occupy a number of sectors. And most files certainly will, because one sector is only 512 bytes. Files on disk will often be much larger than that! So imagine that we have a disk with two of files on it. They’re both 3 sectors in size, and they were physically written to the disk so as to occupy precisely the first 6 sectors of the data area. Now, the first file is deleted. That means we now have 3 empty sectors at the start of the data area. Next, a new file, 4 sectors in size, is copied onto the disk. Where does it go? There’s no space to put new file before the remaining existing file, since there’s only space for 3 sectors. We can only place the new file after the existing file. Maybe at some point a user will want to write a 3-sector file that can fill the gap!

Naturally, if you consider writing and deleting lots of files on a disk, this will leave the file system fragmented and with lots of gaps that we can’t use. And this is where FAT comes in. With FAT, a file that occupies multiple sectors on disk may be broken up into non-contiguous sectors. So a 3-sector file could occupy physical sectors 40, 21 and 304. The bits and pieces can go wherever there is space and all available space is used to the fullest.

The problem then is administration. How does FAT know where the loose pieces of each file are? For this, the FAT system uses its namesake, the File Allocation Table. This table lives right before the root directory and contains an index of all the sectors on the disk. There are actually two copies of the FAT on every disk: one is a backup, for safety. You’ll see that the bootsector actually contains a field with the number of FAT tables on the disk.

The FAT table is actually an index of linked lists of sectors (hello?). An example illustrates how this works. Imagine we have a disk with 10 data sectors on it (we’re ignoring the sectors occupied by boot sector, root directory, FAT etc., and the fact that 10 sectors is a really small disk):

SectorFAT-Value
00x001
10x004
20xFF8
30x007
40x006
50x000
60xFF8
70xFF8
80x000
90xFF7

There are actually three files on this disk. From the root directory we get:

  • File 1 – cluster index = 0
  • File 2 – cluster index = 2
  • File 3 – cluster index = 3

The directory entry for file 1 points to cluster index 0. In the FAT table, the value for this index is 0x001: it points to index 1. Index 1’s value is 0x004: it points to index 4. Index 4’s value is 0x006: is points to index 6. Index 6’s value, finally, is 0xff8, a special value which means that it is the last index of the file.

What this all means is that file 1 occupies sectors 0, 1, 4 and 6. Actually, sector 0 is actually the sector after the root directory on the disk, but that’s a detail.

For file 2 we can now determine that it occupies only one sector (sector 2). The FAT-value for this index is 0xFf8, so there are no more sectors.

For file 3, we see that it occupies sectors 3 and 7. You’ll get the idea by now.

  • Some of the FAT indices have the value 0. That means that the sector is free.
  • Any value of 0xff8 or higher means end-of-file
  • A value of 0xff7 means the sector is bad and should not be used.

A cluster, therefore, is a linked list where each entry points to the next, until the end of the cluster. Each entry relates to an actual physical sector on disk. When reading a file, we’ll have to follow its FAT cluster and read each corresponding sector into memory.

It sounds simple enough, after all, but there’s a catch: the FAT entries occupy 12 bits each (a byte and a half), which will make reading and using the data bit tricky later. You will now see that since each the FAT consists of 9 sectors, it can address 9*512/1.5 = 3072 sectors, which is more than enough, since our disk has 1440 sectors.

Reading the FAT

The FAT lives on the disk, and in order to access it, we’ll have to get it in memory first. The following code does just that: it loads the FAT into a memory segment (I use segment 0x0ee0, so as to place it just under where I’ll be loading the next file):

After this code has executed, all 9 sectors of the FAT now live in memory at 0ee0:0000 .

Reading a file

With out FAT in memory, we are not ready to read a file from disk. By scanning the root directory we had earlier determined the starting index in the FAT table, so we’re good to go:

What happens here is that we start with cx = first sector of file (we know this from the root directory after all). We read that sector into memory, then use the cx value to access the corresponding index in the FAT. There’s a bit of trickiness where we juggle with the 12-bits issue, but then all that remains is whether to continue reading, or stop because end-of-file (FAT-entry 0xff8 or higher) was reached.

Summary

From a bare-bones boot loader, we’ve now come to a point where we can load the kernel (or rather the second-stage boot loader – this will become clear later on)! We’ve seen how the root directory and FAT work, and neatly read both to load the required file from disk. Since we squeezed all this functionality in, the file we’re loading can actually live anywhere in the root of the disk. Gone is the requirement for SYS.COM that DOS and Windows imposed. In our future OS, making a disk bootable should not require a formatting operation. Writing a boot sector and simply coping the OS files onto the disk will suffice.

So far I’ve shown here some sections of code that still need to be sewn together. In the next part of this guide, we’ll add a small kernel (something along the lines of “hello world”) to be loaded and executed. We’ll then have a complete source that can be compiled and run on Bochs (or from a floppy disk if you’re hardcore).

Series index Next tutorial: Using makefiles and the second-stage boot loader

Save

Save

Save

Trackbacks

  1. Writing your own toy operating system: Setting up a toolchain and using Bochs | Independent Software

Comments

9 9 Responses to “Writing your own toy operating system: File Allocation Table (FAT) and reading from disk”
  1. Utkarsh says:

    Dude.. You are awesome..!!

  2. Utkarsh says:

    How does the FAT table entries and the Root Directory Entries get updated when we create a new file in the file system..?? I mean who does this job..??

    • alex says:

      The OS (that is, your future OS) does that. The data on the disk is merely laid out in a way that facilitates organization and it can use many formats (FAT is just one of them). Neither the disk nor the drive has any knowledge of FAT and it’s up to the OS to read and write sectors in a smart way.

  3. alex says:

    Update: the ReadSector function now no longer uses a global variable but rather a register to store the number of read attempts made. It’s more isolated that way.

  4. Marc says:

    Where is the tutorial? 😀
    I only see introduction and the links for the next part but the tutorial itself is missing…

    • alex says:

      Something must’ve gone horribly wrong when I did some small updates to this post. I restored it now. Thanks for letting me know, Marc!

  5. Sam says:

    Thank you very much, your tutorials are really awesome. Not too complicated (like a UNIX System overview I once read) and not too simple like ‘Hello World’ thanks!.

  6. alex says:

    Marcel Thiele sent me some corrections for the ReadFile macro. He writes,

    I’ve found a bug in the fat12 decoding macro. The problem is in the mReadFile macro. The problem is, that not the current cluster was checked, if it was even or not, but rather the new one. Additionally the operations SHR DX, 4 and AND DX, 0x0FFF where switched. I changed the jump mark “read_next_file_even” to “read_next_cluster_odd” and I changed the jump “jz read_next_file_even” to jnz “read_next_cluster_odd”

    I’ve made the corrections above. Thanks Marcel!

Leave a Reply

Your email address will not be published. Required fields are marked *