hcigdbmconvert

This promotes a GDBM file from versions before 3.8.1P rev1. This utility converts null-terminated databases to non-terminated, and resides in $HCIROOT/integrator/bin.

The GDBM record format was changed beginning with version 3.8.1P rev1 as part of the Tcl internationalization character rework. Previously, both the key and value were put into the database as null-terminated strings, including the null character. With the Tcl changes, the null character is omitted. In the GDBM interface, keys and values are specified as type datum. Datum is a struct that contains a byte array plus the array length.

This adapts the changes in Tcl which support international character sets and binary data. Tcl data are no longer null-terminated strings. They are dual-ported Tcl objects which can have a string representation and an internal representation simultaneously.

Tcl strings are counted, including null-terminated, and can contain embedded nulls. Strings are kept in UTF-8 internally in the interpreter. By default, they are converted from and to the external encoding during input/output operations. An extension is made to the UTF-8 encoding. Embedded null characters are encoded using a 2-byte value that is not part of the normal UTF-8 encoding. In this way, embedded nulls, which are part of the data can be distinguished from the string terminators, which are not part of the data.

The GDBM Tcl extensions were set up to efficiently handle strings, including international characters, and binary data, and not to save 2-bytes per record. Now, the Tcl extension moves the data into the extension as a Tcl_Obj type.

  • For binary data, thisFor string data, it performs a conversion on the data from UTF-8 to the external encoding.

Tcl does not null-terminate ByteArray data. Therefore, there is no null-terminator available. To put data into the database with an extra null-terminator would require making another copy of the data.

When GDBM database null-terminated keys and values are extracted with the Tcl extensions, the trailing null is treated as part of the key or value. When the key/value is moved back into the interpreter, this trailing null is converted to the 2-byte value. For binary data, this returns a pointer to the ByteArray internal representation. \xc0\x80 during the conversion to UTF-8.

In versions before 3.8.1P, the GDBM databases used null-terminated keys. A supplied utility converts this key and value format, to the new format without terminating null characters.

hcigdbmconvert oldDB newDB 
  • oldDB is the existing old-format GDBM database.
  • newDB is the new-format GDBM database that is to be created.