DBASE FILE FORMAT (WITH CODING DETAILS) DBF AND DBT/FPT FILE STRUCTURE
Recently I was developing a .NET reader for dBASE files. It turns out there is little (authoritative) information on earlier dBASE file formats like dBASE III and dBASE IV - most of the resources are about dBASE 7. In this post, I bring together information from various sources and my own findings in order to provide a good file format specification for FoxBase, Visual FoxPro, dBASE III and dBASE IV formats, all of which are similar.
Overview
In dBASE, a .dbf
file stores a single table. In it, dBASE stores the database description, field descriptors and record data. Ancient dBASE databases may have only a .dbf
file each table, while modern dBASE incarnations add a special file to hold values for large text fields (memos). This file is either a .dbt
(for dBASE) or a .fpt
(for FoxBase).
Newer dBASE versions add yet more files for commands, procedures, backups, reports and indices. These files are not described in this document.
Overall .dbf file structure
A .dbf
file consist of three blocks:
- Header
- List of field descriptors
- List of records
The field descriptors follow directly after the header data, while the header contains an offset to the start of the record data.
Header
The first element in any .dbf
file is the header. For FoxBASE 2, dBASE III, dBASE IV and dBASE 5 files (and presumably FoxBASE 1), it is always the same size. Some of the bytes contained in the header are reserved and start being used more as the version number increases.
Offset | Size | Type | Sample value | Description |
---|---|---|---|---|
0x00 | 1 | byte | 0x03 | Version byte |
0x01 | 3 | string | 990307 | Date of last update in YYMMDD format (where YY is equal to year minus 1900) |
0x04 | 4 | uint32 | 15 | Number of records in table |
0x08 | 2 | uint16 | 400 | Number of bytes in the header |
0x0a | 2 | uint16 | 64 | Number of bytes in a record |
0x0c | 3 | Reserved | ||
0x0f | 13 | Reserved for dBASE IIIPlus/LAN | ||
0x1c | 4 | Reserved | ||
0x20 | Field descriptors | |||
… | 1 | byte | 0x0d | Field terminator |
Version byte
The version byte is one of the following:
Byte | Bits | Version |
---|---|---|
0x02 | 0000 0010 | FoxBase 1.0 |
0x03 | 0000 0011 | FoxBase 2.x / dBASE III |
0x83 | 1000 0011 | FoxBase 2.x / dBASE III with memo file |
0x30 | 0011 0000 | Visual FoxPro |
0x31 | 0011 0001 | Visual FoxPro with auto increment |
0x32 | 0011 0010 | Visual FoxPro with varchar/varbinary |
0x43 | 0100 0011 | dBASE IV SQL Table, no memo file |
0x63 | 0110 0011 | dBASE IV SQL System, no memo file |
0x8b | 1000 1011 | dBASE IV with memo file |
0xcb | 1100 1011 | dBASE IV SQL Table with memo file |
0xfb | 1111 1011 | FoxPro 2 |
0xf5 | 1111 0101 | FoxPro 2 with memo file |
Wikipedia tells us that the version byte is encoded like so:
Valid dBASE for DOS file; bits 0-2 indicate version number, bit 3 indicates the presence of a dBASE for DOS memo file, bits 4-6 indicate the presence of a SQL table, bit 7 indicates the presence of any memo file (either dBASE m PLUS or dBASE for DOS)
As the bits column in the table above shows, this does not seem to be the case; it might be applicable to more recent dBASE versions but not to dBASE III and IV flavors. At most, it appears that the highest bit (bit 7) incidates the presence of a memo file. In general, I find it is best to simply test for the presence of an .dbt
(dBASE) or .fpt
(Fox) file on disk.
In general, the version byte can be used to determine if a dBASE file reader offers support for the file being opened.
Record lengths
There are two important values in the header:
Number of bytes in the header: this provides an offset to the start of the record data. In some versions, the record data starts directly after the field descriptor list, and it is not necessary to use this offset. In others, there is a gap between the field descriptors and the start of the record data, presumably to align the record data to a round start address, and it is necessary to move the read pointer to this offset.
Number of bytes in a record: The size of the records in a
.dbf
file varies with the fields they contain. Records are stored one after the other, with no delimiter. It is therefore necessary to know the record size in order to read the records.
Field descriptor
A field descriptor for each table field appears directly after the header data. Each field descriptor has a fixed size of 32 bytes. Some of the bytes in the field descriptor are marked as reserved and start to be used in later dBASE versions.
After the last field descriptor, a field descriptor array terminator 0x0d
appears. Since the header does not contain a field count, it is important to watch for the terminator while reading field descriptors.
The field descriptor structure is:
Offset | Size | Type | Sample value | Description |
---|---|---|---|---|
0x00 | 11 | string | PRODID | Field name (padded with NULL-bytes) |
0x0b | 1 | char | C | Field type |
0x0c | 4 | uint32 | Field data address in memory | |
0x10 | 1 | byte | 10 | Field length |
0x11 | 1 | byte | 4 | Field decimal count |
0x12 | 2 | Reserved for dBASE IIIPlus/LAN | ||
0x14 | 1 | byte | Work area ID | |
0x15 | 2 | Reserved for dBASE IIIPlus/LAN | ||
0x17 | 1 | byte | SET FIELDS flag | |
0x18 | 8 | Reserved |
The field name is not a null-terminated string but rather a fixed-size byte array padded with null bytes. dBASE III and IV only support field names up to 11 characters. Note that duplicate field names may appear in a
.dbf
file and may be the result of truncated names stored by other software (ESRI shapefiles are an example of that.)The field data address is used internally by some dBASE versions to map fields to memory. The address has no meaning in a
.dbf
file on disk.The field length is the number of bytes used by the field in each record. This applies to character strings (which are padded with spaces in the records), but also to some (but not all) number field types that are stored as text.
Field types
dBASE III field types
Type code | Type | Sample value | Description |
---|---|---|---|
C | Character | HELLO | A string of characters, padded with spaces if shorter than the field length |
D | Date | 19990703 | Date stored as string in YYYYMMDD format |
F | Float | -492.58 | Floating point number, stored as string, padded with spaces if shorter than the field length |
N | Numeric | -492.58 | Floating point number, stored as string, padded with spaces if shorter than the field length |
L | Logical | Y | A boolean value, stored as one of YyNnTtFf . May be set to ? if not initialized |
Visual FoxPro field types
Subsequent versions added some new types. They also move away from the string-based storage approach to a more compact binary storage, which is something to be wary of while reading the data.
Type code | Type | Sample value | Description |
T | DateTime | 459599234239 | A date and time, stored as a number (see below, under record reading) |
I | Integer | 340 | Integer value, stored as a little endian 32-bit value, with the highest bit used to negate numbers |
Y | Currency | 99.5 | Floating point number, stored as binary in (usually) 8 bytes. |
Memo field types
Starting with dBASE III Plus, a field type was introduced that stores larger strings (memos) in a separate file.
Type code | Type | Sample value | Description |
M | Memo | 2343 | Index of memo block in accompanying memo file, stored as either a string padded with spaces or a binary uint32 value |
Record data
The field descriptor array is followed by data for each record. Note that the record data may start directly after the fields, or there may be a gap between the field descriptors and the record data. For this reason, the header length word from the header must be used to determine the start of the record data.
Records are stored one after the other, with no gaps, field separators or record. Records continue until the end of the .dbf
file, which may be marked by an end-of-file (EOF) character 0x1a
. Note that in some cases, this marker is not present.
Record structure
Offset | Size | Type | Sample value | Description |
---|---|---|---|---|
0x00 | 1 | char | * | Deletion marker |
0x01 | mixed | Record data |
Each record is preceded by a deletion marker character. This is either a space (0x20
) if the record is not deleted, or an asterisk (0x2a
) if the record is deleted.
Reading records
After reading the deletion marker, reading a record is a matter of reading a value of each field that the .dbf
file describes. All fields values occupy the number of bytes specified in the field length property of their field descriptor.
In the earliest dBASE versions, numeric values are encoded as strings (field types F
and N
). A value of 3.14 with a field length of 6
is encoded simply as ` 3.14` (padded with spaces). Reading these fields is a matter of converting the text to the appropriate data type. Note that fields may be filled entirely with spaces to indicate an uninitialized value (not a NULL value – these had not been invented yet). For a numeric value, for example (C#):
In later versions, numeric data is encoded as binary values in little-endian byte order. This is the case for field types I
and Y
, although the older numeric types are still encoded as strings. Reading these values involves reading the specified number of bytes from the input stream, then converting them from binary to the appropriate type. For an integer value, for example (C#):
DateTime values are encoded as 32 bits numbers. The high word is the date, encoded as the number of days since Jan 1, 4713BC, and the low word is the time, encoded as (Hours * 3,600,000) + (Minutes * 60,000) + (Seconds * 1,000)
(the number of milliseconds since midnight).
Decoding this requires an implementation of the Julian calendar (C#):
Memo fields
Not all data resides in the .dbf
file. Starting with dBASE III Plus, support was added for longer text strings, which are stored in a separate memo file. For dBASE, this is the .dbt
file, while for FoxBase it is the .fpt
file (with the same file structure).
In the memo file, entries are stored as blocks. Each block has a fixed size, and there is a header structure which specifies the size of each block. Note that the header structure itself is considered a block (block 0) and generally 512 bytes in size. If the block size is smaller than 512 bytes, the header structure may occupy several blocks. However, no memo index will ever point to data inside the header. Therefore, the first memo index will be at least 1, and likely higher than that.
Memo file header structure
Offset | Size | Type | Sample value | Description |
---|---|---|---|---|
0x00 | 4 | uint32 | 34 | Index of next available data block (for appending data) (BIG ENDIAN) |
0x04 | 2 | uint16 | 64 | Block size in bytes |
0x06 | 506 | (Reserved) |
The data blocks in the memo file contains the data for the memos themselves. Each memo can occupy multiple blocks if its data size exceeds the block size. Memo data is string data. If the memo data does not extend to the end of the block, the remaining bytes are filled with spaces.
Memo block structure
Offset | Size | Type | Sample value | Description |
---|---|---|---|---|
0x00 | 4 | uint32 | 1 | Type |
0x04 | 4 | uint32 | 64 | Length of memo field |
0x08 | - | Memo data |
Reading memo fields
For the memo data type, records in the .dbf
file do not contain for each memo, but rather the index of the data block in the accompanying memo file.
Reading a memo fields involves:
- Determining the memo file block size, contained in the file header
- Reading the block index from the record data in the
.dbf
file - Moving the read pointer to
blocksize * index
in the.mbt
/.fpt
file - Reading the memo field length
- Reading the memo data and converting it to a string
An important caveat follows. In the early versions of FoxBase and dBASE III, the memo index values stored in the record data are stored as strings. That is, an index value of 62, in a memo field with field length 10, is stored as 62
preceded by 8 spaces. In later versions, the memo index is stored in binary as a 32-bit integer value.
Code can distinguish between these two storage approaches by determining the field length. When the field length is 4 bytes, then the memo index is stored as a binary integer value. When the field length is greater than 4 bytes, then the memo index is stored as a string.