Recently I was developing a .NET reader for dBASE files. It turns out there is little (authoritative) information on earlier dBASE file formats like dBASE III and dBASE IV - most of the resources are about dBASE 7. In this post, I bring together information from various sources and my own findings in order to provide a good file format specification for FoxBase, Visual FoxPro, dBASE III and dBASE IV formats, all of which are similar.

Overview

In dBASE, a .dbf file stores a single table. In it, dBASE stores the database description, field descriptors and record data. Ancient dBASE databases may have only a .dbf file each table, while modern dBASE incarnations add a special file to hold values for large text fields (memos). This file is either a .dbt (for dBASE) or a .fpt (for FoxBase).

Newer dBASE versions add yet more files for commands, procedures, backups, reports and indices. These files are not described in this document.

Overall .dbf file structure

A .dbf file consist of three blocks:

  • Header
  • List of field descriptors
  • List of records

The field descriptors follow directly after the header data, while the header contains an offset to the start of the record data.

The first element in any .dbf file is the header. For FoxBASE 2, dBASE III, dBASE IV and dBASE 5 files (and presumably FoxBASE 1), it is always the same size. Some of the bytes contained in the header are reserved and start being used more as the version number increases.

OffsetSizeTypeSample valueDescription
0x001byte0x03Version byte
0x013string990307Date of last update in YYMMDD format (where YY is equal to year minus 1900)
0x044uint3215Number of records in table
0x082uint16400Number of bytes in the header
0x0a2uint1664Number of bytes in a record
0x0c3  Reserved
0x0f13  Reserved for dBASE IIIPlus/LAN
0x1c4  Reserved
0x20   Field descriptors
1byte0x0dField terminator

Version byte

The version byte is one of the following:

ByteBitsVersion
0x020000 0010FoxBase 1.0
0x030000 0011FoxBase 2.x / dBASE III
0x831000 0011FoxBase 2.x / dBASE III with memo file
0x300011 0000Visual FoxPro
0x310011 0001Visual FoxPro with auto increment
0x320011 0010Visual FoxPro with varchar/varbinary
0x430100 0011dBASE IV SQL Table, no memo file
0x630110 0011dBASE IV SQL System, no memo file
0x8b1000 1011dBASE IV with memo file
0xcb1100 1011dBASE IV SQL Table with memo file
0xfb1111 1011FoxPro 2
0xf51111 0101FoxPro 2 with memo file

Wikipedia tells us that the version byte is encoded like so:

Valid dBASE for DOS file; bits 0-2 indicate version number, bit 3 indicates the presence of a dBASE for DOS memo file, bits 4-6 indicate the presence of a SQL table, bit 7 indicates the presence of any memo file (either dBASE m PLUS or dBASE for DOS)

As the bits column in the table above shows, this does not seem to be the case; it might be applicable to more recent dBASE versions but not to dBASE III and IV flavors. At most, it appears that the highest bit (bit 7) incidates the presence of a memo file. In general, I find it is best to simply test for the presence of an .dbt (dBASE) or .fpt (Fox) file on disk.

In general, the version byte can be used to determine if a dBASE file reader offers support for the file being opened.

Record lengths

There are two important values in the header:

  • Number of bytes in the header: this provides an offset to the start of the record data. In some versions, the record data starts directly after the field descriptor list, and it is not necessary to use this offset. In others, there is a gap between the field descriptors and the start of the record data, presumably to align the record data to a round start address, and it is necessary to move the read pointer to this offset.

  • Number of bytes in a record: The size of the records in a .dbf file varies with the fields they contain. Records are stored one after the other, with no delimiter. It is therefore necessary to know the record size in order to read the records.

Field descriptor

A field descriptor for each table field appears directly after the header data. Each field descriptor has a fixed size of 32 bytes. Some of the bytes in the field descriptor are marked as reserved and start to be used in later dBASE versions.

After the last field descriptor, a field descriptor array terminator 0x0d appears. Since the header does not contain a field count, it is important to watch for the terminator while reading field descriptors.

The field descriptor structure is:

OffsetSizeTypeSample valueDescription
0x0011stringPRODID Field name (padded with NULL-bytes)
0x0b1charCField type
0x0c4uint32 Field data address in memory
0x101byte10Field length
0x111byte4Field decimal count
0x122  Reserved for dBASE IIIPlus/LAN
0x141byte Work area ID
0x152  Reserved for dBASE IIIPlus/LAN
0x171byte SET FIELDS flag
0x188  Reserved
  • The field name is not a null-terminated string but rather a fixed-size byte array padded with null bytes. dBASE III and IV only support field names up to 11 characters. Note that duplicate field names may appear in a .dbf file and may be the result of truncated names stored by other software (ESRI shapefiles are an example of that.)

  • The field data address is used internally by some dBASE versions to map fields to memory. The address has no meaning in a .dbf file on disk.

  • The field length is the number of bytes used by the field in each record. This applies to character strings (which are padded with spaces in the records), but also to some (but not all) number field types that are stored as text.

Field types

dBASE III field types

Type codeTypeSample valueDescription
CCharacterHELLO A string of characters, padded with spaces if shorter than the field length
DDate19990703Date stored as string in YYYYMMDD format
FFloat-492.58Floating point number, stored as string, padded with spaces if shorter than the field length
NNumeric-492.58Floating point number, stored as string, padded with spaces if shorter than the field length
LLogicalYA boolean value, stored as one of YyNnTtFf. May be set to ? if not initialized

Visual FoxPro field types

Subsequent versions added some new types. They also move away from the string-based storage approach to a more compact binary storage, which is something to be wary of while reading the data.

Type codeTypeSample valueDescription
TDateTime459599234239A date and time, stored as a number (see below, under record reading)
IInteger340Integer value, stored as a little endian 32-bit value, with the highest bit used to negate numbers
YCurrency99.5Floating point number, stored as binary in (usually) 8 bytes.

Memo field types

Starting with dBASE III Plus, a field type was introduced that stores larger strings (memos) in a separate file.

Type codeTypeSample valueDescription
MMemo 2343Index of memo block in accompanying memo file, stored as either a string padded with spaces or a binary uint32 value

Record data

The field descriptor array is followed by data for each record. Note that the record data may start directly after the fields, or there may be a gap between the field descriptors and the record data. For this reason, the header length word from the header must be used to determine the start of the record data.

Records are stored one after the other, with no gaps, field separators or record. Records continue until the end of the .dbf file, which may be marked by an end-of-file (EOF) character 0x1a. Note that in some cases, this marker is not present.

Record structure

OffsetSizeTypeSample valueDescription
0x001char*Deletion marker
0x01 mixed Record data

Each record is preceded by a deletion marker character. This is either a space (0x20) if the record is not deleted, or an asterisk (0x2a) if the record is deleted.

Reading records

After reading the deletion marker, reading a record is a matter of reading a value of each field that the .dbf file describes. All fields values occupy the number of bytes specified in the field length property of their field descriptor.

In the earliest dBASE versions, numeric values are encoded as strings (field types F and N). A value of 3.14 with a field length of 6 is encoded simply as ` 3.14` (padded with spaces). Reading these fields is a matter of converting the text to the appropriate data type. Note that fields may be filled entirely with spaces to indicate an uninitialized value (not a NULL value – these had not been invented yet). For a numeric value, for example (C#):

// buffer contains the bytes read from the input stream
public object Decode(byte[] buffer)
{
  string text = Encoding.ASCII.GetString(buffer).Trim();
  if (text.Length == 0) return null;
  return Convert.ToSingle(text);
}

In later versions, numeric data is encoded as binary values in little-endian byte order. This is the case for field types I and Y, although the older numeric types are still encoded as strings. Reading these values involves reading the specified number of bytes from the input stream, then converting them from binary to the appropriate type. For an integer value, for example (C#):

// buffer contains the bytes read from the input stream
public object Decode(byte[] buffer)
{
 BitConverter.ToInt32(buffer, 0);
}

DateTime values are encoded as 32 bits numbers. The high word is the date, encoded as the number of days since Jan 1, 4713BC, and the low word is the time, encoded as (Hours * 3,600,000) + (Minutes * 60,000) + (Seconds * 1,000) (the number of milliseconds since midnight).

Decoding this requires an implementation of the Julian calendar (C#):

public object Decode(byte[] buffer)
{
  return ConvertFoxProToDateTime(buffer);
}

private static DateTime ConvertFoxProToDateTime(byte[] buffer)
{
  UInt32 dateWord = BitConverter.ToUInt32(buffer, 0);
  int timeWord = (int)BitConverter.ToUInt32(buffer, 4);

  // Convert date word to DateTime using Julian calendar
  DateTime date = JulianToDateTime(dateWord);

  // Get hour, minute, second from time word
  int hour = timeWord / 3600000;
  timeWord = timeWord - hour * 3600000;
  int minute = timeWord / 60000;
  timeWord = timeWord - minute * 60000;
  int second = timeWord / 1000;

  // Add time to DateTime
  return new DateTime(date.Year, date.Month, date.Day, hour, minute, second);
}

// Convert a Julian Date as long to a .NET DateTime structure 
// (see http://en.wikipedia.org/wiki/Julian_day)
private static DateTime JulianToDateTime(long julianDateAsLong)
{
  if (julianDateAsLong == 0) return DateTime.MinValue;
  double p = Convert.ToDouble(julianDateAsLong);
  double s1 = p + 68569;
  double n = Math.Floor(4 * s1 / 146097);
  double s2 = s1 - Math.Floor(((146097 * n) + 3) / 4);
  double i = Math.Floor(4000 * (s2 + 1) / 1461001);
  double s3 = s2 - Math.Floor(1461 * i / 4) + 31;
  double q = Math.Floor(80 * s3 / 2447);
  double d = s3 - Math.Floor(2447 * q / 80);
  double s4 = Math.Floor(q / 11);
  double m = q + 2 - (12 * s4);
  double j = (100 * (n - 49)) + i + s4;
  return new DateTime(Convert.ToInt32(j), Convert.ToInt32(m), Convert.ToInt32(d));
}

Memo fields

Not all data resides in the .dbf file. Starting with dBASE III Plus, support was added for longer text strings, which are stored in a separate memo file. For dBASE, this is the .dbt file, while for FoxBase it is the .fpt file (with the same file structure).

In the memo file, entries are stored as blocks. Each block has a fixed size, and there is a header structure which specifies the size of each block. Note that the header structure itself is considered a block (block 0) and generally 512 bytes in size. If the block size is smaller than 512 bytes, the header structure may occupy several blocks. However, no memo index will ever point to data inside the header. Therefore, the first memo index will be at least 1, and likely higher than that.

Memo file header structure

OffsetSizeTypeSample valueDescription
0x004uint3234Index of next available data block (for appending data) (BIG ENDIAN)
0x042uint1664Block size in bytes
0x06506  (Reserved)

The data blocks in the memo file contains the data for the memos themselves. Each memo can occupy multiple blocks if its data size exceeds the block size. Memo data is string data. If the memo data does not extend to the end of the block, the remaining bytes are filled with spaces.

Memo block structure

OffsetSizeTypeSample valueDescription
0x004uint321Type
0x044uint3264Length of memo field
0x08-  Memo data

Reading memo fields

For the memo data type, records in the .dbf file do not contain for each memo, but rather the index of the data block in the accompanying memo file.

Reading a memo fields involves:

  • Determining the memo file block size, contained in the file header
  • Reading the block index from the record data in the .dbf file
  • Moving the read pointer to blocksize * index in the .mbt/.fpt file
  • Reading the memo field length
  • Reading the memo data and converting it to a string

An important caveat follows. In the early versions of FoxBase and dBASE III, the memo index values stored in the record data are stored as strings. That is, an index value of 62, in a memo field with field length 10, is stored as 62 preceded by 8 spaces. In later versions, the memo index is stored in binary as a 32-bit integer value.

Code can distinguish between these two storage approaches by determining the field length. When the field length is 4 bytes, then the memo index is stored as a binary integer value. When the field length is greater than 4 bytes, then the memo index is stored as a string.