Open Source Content Management Framework

mRFC 0032: Multilingual MidCOM

  1. Multilingual sites
  2. MultiLang background
  3. Implementation into MidCOM
    1. Choosing languages used on site
    2. Language selection for content
    3. Navigation building and object instantiation
    4. Content editing
    5. Content language vs. UI language
    6. Different language versions
  4. API-level requirements

This proposal outlines how multilingual sites should be managed in the MidCOM framework utilizing the MultiLang features of Midgard 1.8. This mRFC been submitted to the Midgard Community for discussion and approval under the Creative Commons Attribution-ShareAlike license.

Multilingual sites

Midgard's base of power is Europe where there are many languages and many organizations operating across language barriers. This makes it very important to support easy management of multilingual sites.

Traditionally there have been two different types of multilingual sites:

  • Sites in multiple languages: These are sites that contain approximately same information in each language, either mandated by law or by organization's desire to reach different language audiences
  • Multiple language versions of site: These sites can contain very different content structures targeted at domestic and foreign markets, or based on some other differentiation

This proposal focuses on the first of these two. Multiple language versions of a site are better managed using separate content trees for each site than using MultiLang features.

MultiLang background

Right from the beginnings of the project, Midgard has had a strong international focus. Multibyte character set support landed in Midgard already in October 1999 in order to support managing sites in languages like Russian and Chinese. UTF-8 was made the default character set for Midgard in the 1.6 series.

However, in addition to character sets, also translation of actual content was needed. MultiLingual Midgard was presented as a patch by David Schmitter from DataFlow in Switzerland in May 2003, and entered Midgard proper for the 1.5.0 release.

This made the Midgard content formats support different translations of content, but besides DataFlow's proprietary WebInOne publishing tool, no Midgard authoring interface has actually added MultiLang support.

The reason for not supporting MultiLang has mostly been the difficulty of integrating good translation workflow, and unclear PHP-level programming APIs for it. The APIs have now however matured enough with integration to MgdSchema and Query Builder tools so that supporting MultiLingual content cleanly is finally possible.

Implementation into MidCOM

Here is a description on how MultiLang should be implemented into the Midgard Component Framework. The aim here is to make creating and maintaining multilingual sites easy without having to impose changes into MidCOM components.

Choosing languages used on site

Webite's language selections (languages available for the site and the default language) are stored in the MidCOM configuration array.

The language selections can be made using Site Wizard for better convenience

Language selection for content

Languages would be selected using the $argv[0] prefix of the site, i.e. Finnish version of "About us" would reside in /fi/about. This can be implemented in a centralized fashion by modifying MidCOM's URL parser and making it call the corresponding MidCOM i18n service set_language method for changing language.

In addition to the per-language URLs, there would be the "language neutral" URL for each object. This would redirect user to appropriate URL based on their browser language selection, i.e. /about would redirect me to /de/about for German language users.

Navigation building and object instantiation

Language versions of site should obviously show only the contents translated to their languages in navigation and content listings (like news lists for instance). Since MidCOM has a centralized wrapper for Query Builder, the DBA can easily add the language constraint into all DB queries transparently. This means components don't have to do any language-based processing to show only Norwegian news items.

Similarly, when an object is instantiated (get_by_guid etc), DBA should check if the $object->lang is the current language and if not fail the instantiation.

MidCOM should contain a specific "translation to this language not found" error message that could be triggered if requested object is not available in current language. This error message could contain links to the existing language versions.

Content editing

Since editing MidCOM content now happens on-site and there are language prefixes available, switching editing views between languages is just a matter of changing the site prefix.

However, there should be some facilities for showing to translators what has been changes in the "default language" contents of an object so they know what to translate. This relates to RCS version control services in MidCOM.

Content language vs. UI language

MidCOM's i18n service has already been modified so that user interface language can be different from actual content language. Many multilingual organizations prefer to run their software in a common language to ease documentation and training.

Different language versions

The information on different language versions of an object must be available so that MidCOM's metadata system can populate links to the different translations.

API-level requirements

Here is a list of requirements from the MultiLang implementation in Midgard Core that are needed for this mRFC:

  • The midgard_language objects must be available for regular Query Builder operations, and must contain a locale field with the appropriate UNIX system locale
  • When mgd_set_lang(X) has been called, all object update/create/delete operations must apply to the content of that particular language
  • $object->get_languages() must return list of languages the object is available for
  • Each language version must contain a revision timestamp to help with translation workflow
  • When set_lang has been called for Query Builder, it must only return objects that are in that particular language, no lang0 objects

Back

Designed by Nemein, hosted by Anykey