Open Source Content Management Framework

mRFC 0018: Adopting the UUID and URN standards

This document is a proposal for adopting the UUID [1] and URN [2] standards in Midgard for the representation and handling of globally unique identifiers (GUIDs). This proposal is backwards compatible with the existing Midgard GUIDs and adds limited support for also other types of globally unique identifiers.

This mRFC has been submitted to the Midgard Community for discussion and approval under the Creative Commons Attribution-ShareAlike license.

Terminology

This document uses the term GUID to refer to any kind of globally unique identifier. The term UUID is used for unique identifiers specified in [1]. The term Midgard GUID is used for the GUIDs used in by Midgard. This proposal is about extending and augmenting the definition of Midgard GUIDs.

Background

GUIDs were originally added to Midgard to make it possible to reliably replicate records between two databases with conflicting local record ID numbers. The GUIDs have since also been adopted as persistent record identifiers for in Midgard applications and permanent URL addresses. The current plan is to use GUIDs as the primary API-level record identifiers.

The current Midgard GUIDs are 128 bit identifiers produced either by a standard UUID implementation [3] or by creating an MD5 hash from the string "re?pli??ga?rd?.?.?" where the question marks are replaced respectively by:

  • the table name of the stored record,
  • the current unix timestamp,
  • the available host identifiers,
  • the local identifier of the stored record,
  • the version of the Midgard installation,
  • a runtime counter (0-9999), and
  • the current process identifier.

The GUIDs are encoded into 32 character lowercase hexadecimal strings for storage and use. There is normally no need to ever translate the GUID strings back to their binary format.

There are an unknown number of PHP and other applications that expect Midgard GUIDs to be strings of exactly 32 lowercase hexadecimal characters.

Limitations of the current GUIDs

While the Midgard GUIDs have served well for their purposes so far - namely facilitating database replication and persistent record identification - there are a few needs that are not met by the current GUIDs. The most pressing of these issues are standards compatibility, support for replicating non-Midgard content, and handling of GUIDs outside the context of Midgard.

Standards compatibility

The homegrown methods of generating and representing Midgard GUIDs are not as fail-safe and well understood as the standard unique identifiers. For example the UUID standard [1] specifies unique identifiers that are more efficient in terms of avoiding collisions, better supported by related standards (like URN), more widely understood, and have a distinct string representation that can be more easily detected and validated. Other identifier standards provide similar benefits over the current Midgard GUIDs.

Replicating non-Midgard content

The Midgard GUIDs can currently only be used for content that exists within a Midgard installation. Thus it requires separate bookkeeping for a user to replicate content from an external content store (like a calendaring application) to a Midgard database, modify it in Midgard, and replicate the changed contents back to the original location. This use case would be greatly simplified if Midgard supported externally assigned GUIDs.

Handling GUIDs outside the context of Midgard

In many cases it is necessary to refer to Midgard records outside the context of a Midgard application. In these situations it would be good if the record GUIDs could somehow be identified as being identifiers instead of just generic hexadecimal strings. Having a standardized mechanism for such tagging of Midgard identifiers would help interoperability and lower the learning curve of such applications.

Adopting the UUID and URN standards

The UUID [1] and URN [2] standards can be used to overcome the limitations described above. This section describes a mostly backwards compatible migration path for adopting these standards in Midgard and the benefits of doing so. The proposal also adds limited support for other types of unique identifiers.

The basic idea of this proposal is to relax the formatting and generation rules of the Midgard GUIDs and to specify standard UUIDs as the default identifiers to use for all new Midgard records. In addition two URN formats are specified for use when handling Midgard GUIDs outside the context of Midgard. More specifically, the proposed changes are:

  1. A Midgard GUID can be any string that matches the regular expression /[0-9-a-f-]{21,80}/, i.e. contains at least 21 and at most 80 lowercase hexadecimal digits or hyphens.
  2. It is possible to set the GUID of a record when it is created. The assigned GUID must conform with the constraints set above. Only the standard UUID identifiers are officially supported, but other syntactically conforming identifiers could also be used.
  3. If a specific GUID is not given when a record is created, then a standard UUID is generated as the record GUID.
  4. UUIDs are represented using the proposed UUID URN format [1] when handled outside the context of Midgard.
  5. Other GUIDs are represented using the experimental X-midgard-guid URN namespace specified in Appendix A.

GUID format rules changed (jlz)

The first rule above was changed on 2005-07-30 based on a mailing list comment by Torben Nehmer. The change added the lower limit of at least 21 characters for the GUID length. This change ensures that normal decimal row or record identifiers can reliably be distinguished from GUID strings.

The regular expression in the URN registration appendix was also changed accordingly.

Backwards compatibility

The proposed changes require no changes to the Midgard database structure or any of the existing Midgard records. Existing records will keep their GUIDs and thus all the existing configuration files, links, and other places where GUIDs are referenced do not need to be changed.

The only real backwards compatibility problem is in the unknown number of applications that assume the 32 hex character format of the Midgard GUIDs. All these applications would need to be changed to support he more relaxed GUID format specified in rule 1 above.

Standards compatibility

Using the UUID and URN standards would make Midgard GUIDs more efficient as identifiers, more interoperable, and help simplify the Midgard learning curve. And they would make a nice addition to the "supported standards" list.

Replicating non-Midgard content

Supporting externally assigned GUIDs would make it much easier to replicate content between Midgard and external systems. If an external record could keep it's original GUID (if such an identifier is available) when replicated into a Midgard database, there would be no need for managing a separate "original identifier" field.

Only standard UUID identifiers should be officially accepted as external identifiers, but see below for ideas on how to support other types of unique identifiers.

Handling GUIDs outside the context of Midgard

Using the URN standard would provide a distinct and well understood mechanism for labelling Midgard GUIDs as identifiers in contexts where such labelling is needed. Using URNs would also make Midgard GUIDs compatible with any application that uses URIs [5] to identify resources. For example using the XLink [6] standard it would be possible to reliably refer to Midgard records within arbitrary XML documents.

Support for other types of unique identifiers

Although the old Midgard GUIDs and standard UUIDs would be the only officially supported identifier formats, this proposal permits the use of also other types of unique identifiers that meat the syntax constraints specified in rule 1 above. It is the responsibility of the assigning application or user to enforce the uniqueness of such identifiers.

In case an external resource has a unique identifier that does not match the specified syntax, it is recommended that a name-based UUID be generated from the original identifier using the namespace UUID "00dc46a0-0e0c-1085-82bb-0002a5d5fd2e" generated (but not registered) by Jukka Zitting on 2005-05-21 using the UUID generation form at http://www.itu.int/ITU-T/asn1/uuid.html. The original identifier can then be reliably mapped to the corresponding Midgard GUID without needing to store the original identifier in the Midgard database. This solution makes it possible to reliably replicate updates of the original record into the Midgard database without having to keep track of the original identifier. Replicating changes back from the Midgard database would still require storing the original identifier into the database along with the generated UUID.

Drawbacks and alternative solutions

The uniqueness of random and name-based UUIDs is not guaranteed, and it has been claimed that a an actively used system will thus have UUID collisions within a decade. Additionally, it is hard for humans to validate the correctness of an UUID, which leads to increased possibility of copy-paste errors and other mistakes when compared to more "user friendly" identifier formats.

A straightforward solution to these drawbacks would be to use general URIs as record identifiers. For example the tag URI scheme [7] could be used to automatically generate fully unique Midgard record identifiers. All sorts of external identifiers could easily be supported using the URN URI scheme or other similar mechanism. However this approach faces the serious problems of introducing complex identifier equivalence rules and requiring extensive encoding when including identifiers in HTTP URLs.

Another partial solution to the problem of handling external content is to define a separate database field for storing the possible externally assigned unique identifier. This approach would however introduce unnecesary complexity and ambiguity into the handling of record identifiers.

Definitions

GUID
Globally unique identifier. An identifier that is (or can reasonably be expected to be) unique over space and time. A custom GUID format is used in Midgard as persistent identifiers of all content records. Also known as an alias for the standardized UUID identifiers.
URI
Uniform resource identifier. See [5].
URN
Uniform resource name. See [2].
UUID
Universally unique identifier. A standardized (see [1] and the ISO/IEC standard 9834-8:2004) form of globally unique identifiers. Used widely in the Distributed Computing Environment (DCE), Microsoft Windows, and many other major software systems.
XLink
XML Linking Language. See [6].

References

[1] P. Leach, M. Mealling, and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, July 2005. ftp://ftp.rfc-editor.org/in-notes/rfc4122.txt

[2] R. Moats, "URN Syntax", RFC 2141, May 1997. http://www.ietf.org/rfc/rfc2141.txt

[3] T. Ts'o, "libuuid", part of the ext2 filesystem utilities. http://e2fsprogs.sourceforge.net/

[4] J. Zitting, "Exorcist", software project, 2005. http://svn.yukatan.fi/exorcist/

[5] T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", RFC 3986, January 2005. http://www.ietf.org/rfc/rfc3986.txt

[6] S. DeRose, E. Maler, and D. Orchard, "XML Linking Language (XLink) Version 1.0", W3C Recommendation, June 2001. http://www.w3.org/TR/2001/REC-xlink-20010627/

[7] T. Kindberg and S. Hawke, "The 'tag' URI scheme", Internet Draft, January 2005. http://www.ietf.org/internet-drafts/draft-kindberg-tag-uri-07.txt

Appendix A: The X-midgard-guid URN namespace

The URN namespace registration form below contains information about the experimental X-midgard-guid namespace proposed in this mRFC. This is not an official registration request for the namespace.

Namespace ID
X-midgard-guid
Registration Information
Version 1. Proposed in 2005-05-21.
Declared registrant of the namespace
Jukka Zitting , The Midgard project
Declaration of syntactic structure
The identifiers are strings that match the regular expression /[0-9a-f-]{21,80}/.
Relevant ancillary documentation
mRFC 0018
Identifier uniqueness considerations
The identifiers are expected to uniquely identify Midgard records across space and time.
Identifier persistence considerations
An identifier is valid as long as the identified record or a trail of its deletion exists in any database into which the record has been replicated.
Process of identifier assignment
Identifiers are either automatically generated GUIDs of old Midgard records or specifically assigned external identifiers of new Midgard records. Old GUIDs were automatically generated by the Midgard application, and external identifiers are assigned according to the policies set by the application or user creating the records.
Process for identifier resolution
Identifiers are resolved by querying the Midgard database for a matching record.
Rules for Lexical Equivalence
Two identifiers are equivalent if and only if they are lexically equal.
Conformance with URN Syntax
The identifiers conform fully with the URN syntax.
Validation mechanism
An identifier is valid if it matches the syntactic structure described above.
Scope
The scope of this namespace are all non-UUID Midgard record identifiers.

Back

Designed by Nemein, hosted by Kafit