Open Source Content Management Framework

Midgard's Multilang Support

1 2 3 next »
  1. Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    ==What Is Needed==

    A true multilingual web site is able to serve one URL in multiple languages. If e.g. there is a page available in English and in Finnish, and a visitor has configured browser to show pages in Finnish whenever possible, the Finnish version of the page would be returned to the browser. For some other visitor the same page might be shown in English.

    Usually multilingual sites use content negotiation (a "process of selecting the best representation for a given response when there are multiple representations available"). This means you will always get a page in some language if the page exists in any language. The server selects the best match for you in this case.

    The client sends a priority list of the languages it wishes to get (in HTTP this is the Accept-Language header). The first match from that list is then served. If the page is not available in any of those languages the page is shown in the default language of the page/site. Naturally, if the page does not exist at all, a 404 is returned instead. But if the page exists in any language, the best match is served.

    When a page is requested in some explicit language, the URL always contains the language in it. (So it is not the same URL anymore as above where the language is left "open" for the server to decide based on client's wishes.) If the page does not exist in the requested language naturally a 404 is returned as the URL did not resolve.

    ==How This Reflects to Midgard==

    The idea is to keep multilang transparent so that legacy applications and web sites continue to function the same was as before. And, because this is the fastest way to add multilang support into applications.

    For the explicit requests to work, it is important that all content is saved into corresponding language fields (Finnish text must be saved using "fi" as the language so that explicit query with "fi" finds it). Lang0 should only be used when the language of the content is unknown.

    This "requires" that every multilanged object has a fallback language/content field which holds the id of one of its language content objects (a foreign key pointing to one of the content objects). When a multilanged object gets its first content object created, the language of the content is automatically set as the default/fallback content language for that object. This field is not absolutely necessary though. Midgard could just have a hard coded priority list of languages (perhaps just return the content object with the lowest language id e.g.).

    There should be support to define (a) global default language(s). When this/these language(s) is/are defined, it/they has/have a higher priority than the object level fallback language(s). Think of this/these global default(s) as (a) server side addition to the client's language wish list (as (a) last language(s) in the list). If there is no content even in the global default language(s), then the object's fallback language (which could also be iterated from hard coded list if you think this object level default language is not a needed feature in practise) is returned as the last choise.

    Following API calls are needed for Midgard to support the described features:

    • Optional: Change object's default content language (content in the given language must exist). Legacy style example: $object->fallback_lang

    • Define global default language(s). Legacy style example: mgd_default_lang($lang_or_langs);

    • Set Accept-Language values for negotiation (should be API call - not automatic feature - we need to support command line applications too - and it is always good to avoid too automatic features as they don't always fit for every usage). Legacy style example: mgd_accept_language($langs);

    • Set global requested language. This call should have a boolean parameter which defines if the request is explicit or negotiating. Perhaps explicit should be the default because that feels more logical as language is set with this call (refering to the function name). Legacy style example: mgd_set_lang($lang,$negotiate=false);

    Actually I started to think those mgd_default_lang() and mgd_accept_lang() could be the same call. You can always just add the server side values to the end of the client ones. Some mgd_negotiate_langs($langs) or something... That would make things a lot simpler and still give us the same features.

    Based on those thoughts, the changes required in core are quite small. This should be doable. When you summarize this post even more, you will end up with a simple suggestion: New API call to add langs + make core to iterate the requests + if no matches, return the content object with lowest lang id (if we just have a hard coded defaulting - perhaps that would be just fine because web sites usually have matching default languages anyway). Make mgd_set_lang() to be explicit by default and negotiative via extra parameter.

    Perhaps that mgd_negotiate_langs() functionality could also be in mgd_set_lang() actually. Oh, yes! Of course! If the given parameter is an array instead of language, then negotiate. Otherwise be explicit. No need for an extra parameter or even new API calls. This gets simpler and simpler all the time I write this. :)

    ==What About MidCOM==

    It's true that MidCOM could implement all this using calls with languages in them. But the transparent way would be much faster to implement. We also really need to figure this multilang thing in core level anyway. The current state of multilang in core is not perfectly usable. It does not reflect the real need properly enough. It is not bad. It just needs small changes.

    More detailed discussion about multilang and MidCOM should be started at midcom-devel@lists.gforge.nehmer.net after core side of things is clear.

    ==Note About midgard-apache and Multilang==

    The current way of handling multilang with pages using cookies is not very usable. Usually you want to support the Accept-Language way. And the current cookie usage blocks that possibility. I would be very carefull with features like the MidgardLang cookie. We don't want features which block other features or which are not suitable for every usage.

    Anyway, this was just a sidenote as the current MidCOM e.g. does not even use pages but serves content from the topic tree. The general object API with multilang should be thought through first.

    ==Conclusion: My Suggestion What to Do in Midgard's Core==

    • If mgd_set_lang() gets an array instead of a single language, negotiate all queries (find the best match from the given languages - or if no match, return existing content object with lowest language id). Queries return false only if objects don't exist at all.

    • If mgd_set_lang() gets a single language, be explicit (return the content object with the given language - if one does not exist, return false).

    Now that I've got to this point I have to say I'm not sure how much the iteration would slow things. So I'm still a bit unsure about this... :)

    Well, basically I feel that even this could slow things, this is still the way users expect this multilang to work. If you want it faster, don't define too big array of languages for the iteration. ;) With only two or so languages this would work as fast as the current lang0 = default. So this will not really slow things after all compared to the current situation. This would just make things to work like they should and give real negoatiation support for multilang.

    I hope my "writing before thinking" again didn't make you think this is a complex issue. :) Now that I've got to the end of this post I feel this is relatively easy thing to implement - at least in terms of Midgard's core (not yet thinking about MidCOM). You just change the default to be explicit and add negotiation support so that the "best match" is returned. Simple really. Happy coding guys! /me goes to get some sleep before two weeks of summer holiday starts :)

    •  Reply
  2. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Hi,

    When a page is requested in some explicit language, the URL always contains the language >in it. (So it is not the same URL anymore as above where the language is left "open" for >the server to decide based on client's wishes.) If the page does not exist in the >requested language naturally a 404 is returned as the URL did not resolve.

    It should be resolved more flexible way IMO. For example , if we agree that language transparency is best feature , then one should be able to send particular 404 error page with some additional note: "The page you requested is not found or it is not translated to requested language. You may follow document."

    ==How This Reflects to Midgard==

    The idea is to keep multilang transparent so that legacy applications and web sites >continue to function the same was as before. And, because this is the fastest way to add >multilang support into applications.

    Mostly. But we should consider such issues:

    • mgd_get_object_by_guid Should this function always return correct object in language context, or maybe language objects should have different guids and thus be different objects?

    • Replication Should object be exported as multiple objects with the same guid so replication process should manipulate languages? Or should we export every lang object as particular object with self defined guid ( which breaks get_by_guid ) ?

    • Do we want to write code in language context or keep exisiting way and write language unaware code?

    For the explicit requests to work, it is important that all content is saved into >corresponding language fields (Finnish text must be saved using "fi" as the language so >that explicit query with "fi" finds it). Lang0 should only be used when the language of >the content is unknown.

    We can make lang0 configurable in ini file or by API methods ( to set lang from host's lang entry ). So lang0 can be kept as unknown ( set as default by application or convention ), or configured lang can be set instead of lang0.

    And , as we speak about languages' codes:

    http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

    The table almost perfectly describes Midgard objects and its corresponding table.

    class midgard_lang {
    
    var *code;
    var *name;
    var *native;
    var *guid;
    var *id;
    
    }
    

    This "requires" that every multilanged object has a fallback language/content field >which holds the id of one of its language content objects (a foreign key pointing to one >of the content objects). When a multilanged object gets its first content object >created, the language of the content is automatically set as the default/fallback >content language for that object. This field is not absolutely necessary though. Midgard >could just have a hard coded priority list of languages (perhaps just return the content >object with the lowest language id e.g.).

    That will break all known SQL queries. Limit, offset, count will return unknown and completely undefined resultsets.

    Following API calls are needed for Midgard to support the described features:

    Optional: Change object's default content language (content in the given language must >exist). Legacy style example: $object->fallback_lang

    You mean method or just property set&get?

    Define global default language(s). Legacy style example: mgd_default_lang ($lang_or_langs);

    Any use cases to set default lang in runtime? Or we should use this at apps' startup?

    Set Accept-Language values for negotiation (should be API call - not automatic feature - >we need to support command line applications too - and it is always good to avoid too >automatic features as they don't always fit for every usage). Legacy style example: >mgd_accept_language($langs);

    Based on those thoughts, the changes required in core are quite small. This should be >doable. When you summarize this post even more, you will end up with a simple >suggestion: New API call to add langs + make core to iterate the requests + if no >matches, return the content object with lowest lang id (if we just have a hard coded >defaulting - perhaps that would be just fine because web sites usually have matching >default languages anyway). Make mgd_set_lang() to be explicit by default and negotiative >via extra parameter.

    Perhaps that mgd_negotiate_langs() functionality could also be in mgd_set_lang() >actually. Oh, yes! Of course! If the given parameter is an array instead of language, >then negotiate. Otherwise be explicit. No need for an extra parameter or even new API >calls. This gets simpler and simpler all the time I write this. :)

    Midgard core is absolutely unaware of Apache request or content negotiation. Such functionality must be implemented on higher level only for particular environment.

    ==Note About midgard-apache and Multilang==

    The current way of handling multilang with pages using cookies is not very usable. Usually you want to support the Accept-Language way. And the current cookie usage blocks that possibility. I would be very carefull with features like the MidgardLang cookie. We don't want features which block other features or which are not suitable for every usage.

    ML cookie is deprecated.

    ==Conclusion: My Suggestion What to Do in Midgard's Core==

    If mgd_set_lang() gets an array instead of a single language, negotiate all queries (find the best match from the given languages - or if no match, return existing content > object with lowest language id). Queries return false only if objects don't exist at all.

    This is solution for higher level only. Core must support such cases:

    • return object with default lang only
    • return object with explicit language
    • return object with default lang if object with explicit lang can not be found

    Now that I've got to this point I have to say I'm not sure how much the iteration would >slow things. So I'm still a bit unsure about this... :)

    Any list method should produce dozen SQL queries so slow is a light version ;)

    Piotras

    •  Reply
  3. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Hi,

    I just added new midgard_language class. http://www.nemein.com/people/piotras/midgard_language.html

    Piotras

    •  Reply
  4. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Regarding translation UIs, the GNOME guys had an interesting idea:

    PO files are a MUST for The Gnome TranslationProject—we don't want to reeducate all the translators with new policies, and take away their tools (translation memories, fuzzy matching, status pages...)

    This would basically mean that the site contents would be constructed on-site in lang0/default language, and then for translators there would be an utility for exporting/importing MultiLang content in PO format.

    Since lots of translators are familiar with poEdit and other similar tools, this approach would be easier for them than learning a new web-based tool.

    I guess the PO export facility should either export full site contents, or only contents requiring translation (updated objects, objects with no translation to langN).

    Message IDs could be simply objectguid_fieldname.

    How does this sound? This could be the way to make MultiLang Midgard a reality quite easily, and without touching MidCOM UIs much. With it we would also get more experience with the translation workflow, enabling us to build a much nicer web-based content translation UI.

    /Bergie

    PS: If we decide to go this way, we should also consider porting MidCOM's translation files into PO format to get rid of the relatively uncomfortable web-based translation UI. It could also gain us a bit of speed as PO files can be parsed by a PHP extension instead of PHP code.

    •  Reply
  5. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Hi,

    This would basically mean that the site contents would be constructed >on-site in lang0/default language, and then for translators there would be an utility for exporting/importing MultiLang content in PO format.

    Since lots of translators are familiar with poEdit and other similar >tools, this approach would be easier for them than learning a new web-based tool.

    Fully agree as long as it can be used only for localization strings. Midgard 1.8alpha3 already compiles with gettext support and basic po files are ready.

    I guess the PO export facility should either export full site contents, or >only contents requiring translation (updated objects, objects with no >translation to langN).

    The problem with po files is that you can not update or create them "just like that". Let's say you have to make some computation and "compile" them to binary format , so every object's update or create method will require additional overhad with I/O requests.

    How does this sound? This could be the way to make MultiLang Midgard a >reality quite easily, and without touching MidCOM UIs much. With it we >would also get more experience with the translation workflow, enabling us >to build a much nicer web-based content translation UI.

    Such approach requires touching MidCOM's UI very much as every echo $obj->content or &(obj.content); requires passing it as gettext function's argument.

    All messages and localization string should be supported by gettext IMO. It works quite nice with web interfaces as I remind myself that me and sergiei made such translations for spider admin few years ago. The dynamic content is something which shouldn't be translated with gettext, or at least it requires experimental mock-up with performance and usability tests.

    Ah! Last but not least! In debian for example , all mo files are located in /usr/share/locale/$lang/LC_MESSAGES directory. If MidCOM should use this directory then it means I need to break system as I need set apache user as owner of this dir. I do not remember if mo files dir is settable. If it is , that should solve this kind of issue.

    Piotras

    •  Reply
  6. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Such approach requires touching MidCOM's UI very much as every echo $obj->content or &(obj.content); requires passing it as gettext function's argument.

    I think you misundestood me here. Midgard would still store the translated content into its ML database, and serve from there, just like in all ML scenarios.

    The idea here was merely that when user wants to translate a site (or parts of it), they would ask MidCOM (or Midgard) to generate them a PO file of the content needing translation.

    Then they would take this file and edit it with poEdit or whatever, and finally import it back into Midgard, where the file would be parsed and stored into the appropriate ML tables.

    /Bergie

    •  Reply
  7. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Hi!

    Here is how I see ML as being implemented into MidCOM. I'll try to describe all the pieces that are affected here:

    Choosing languages used on site:

    • Site's language selections (languages available for the site and the default language) are stored in the MidCOM configuration array
    • The language selections can be made using Site Wizard for better convenience

    Language selection for content:

    • Languages would be selected using the $argv[0] prefix of the site, i.e. Finnish version of "About us" would reside in /fi/about
    • This can be implemented in a centralized fashion by modifying MidCOM's URL parser and making it call either mgd_set_lang or the corresponding MidCOM i18n service method
    • In addition to the per-language URLs, there would be the "language neutral" URL for each object. This would redirect user to appropriate URL based on their browser language selection, i.e. /about would redirect me to /de/about

    Navigation building and object instantiation:

    • Language versions of site should obviously show only the contents translated to their languages in navigation and content listings (like news lists for instance)
    • Since MidCOM has a centralized wrapper for Query Builder, the DBA can easily add the language constraint into all DB queries transparently. This means components don't have to do any language-based processing to show only Norvegian news items or whatever
    • Similarly, when an object is instantiated (get_by_guid etc), DBA should check if the $object->lang is the current language and if not fail the instantiation/cause exception

    Content editing:

    • Since editing now happens on-site and there are language prefixes available, switching editing views between languages is just a matter of changing the site prefix
    • However, there should be some facilities for showing to translators what has been changes in the "default language" contents of an object so they know what to translate

    So to wrap up, MidCOM core needs the following changes:

    • Add list of languages of a site into config
    • Modify URL parser to utilize language selection as $argv[0]
    • Make DBA query builder limit results based on language using the set_lang method
    • Make DBA object instantiation cause Exception if $object->lang is not the current language

    What we need from core:

    • Method for querying what translations of an object are available, regardless of whether we're in lang0 or langN. This would be used for implementing LINK tags and/or UI-level links to "this article in Polish"

    What do you think?

    /Bergie

    •  Reply
  8. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    I think you misundestood me here. Midgard would still store the translated >content into its ML database, and serve from there, just like in all ML >scenarios.

    Ah... now I am more than confused :)

    The idea here was merely that when user wants to translate a site (or parts >of it), they would ask MidCOM (or Midgard) to generate them a PO file of >the content needing translation.

    Then they would take this file and edit it with poEdit or whatever, and >finally import it back into Midgard, where the file would be parsed and >stored into the appropriate ML tables.

    What's the benefit? Translate, write to po file, compile mo one, import data to database. vs Translate , submit ( being in lang context ).

    I do not understand it.

    Piotras

    •  Reply
  9. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Choosing languages used on site:

    This can be easy to do as midgard_language provides almost static languages, so even id for every lang is the same across many sites.

    Language selection for content:

    I started to like Jarkko's idea about default lang. So we can replace lang 0 with default lang. Of course such change is fully transparent for any midgard API but one always create 'no' content insetad of lang0 when no is set as default language.

    Navigation building and object instantiation:

    QB has set_lang method and I think we can improve it a bit so after calling this method only explicit language records could be returned. In such case with a bit of luck we could create pure SQL queries for limit or count.

    What we need from core:

    Current core misses functionality like mgd_get_object_by_all_langs, but some simple get_langs() method would prefectly replace it.

    What do you think?

    I think that you described almost perfectly the main idea of ML feature provided in Midgard 1.6 :)

    The only problem I can see is replication as every mulitlingual object has the same guid.

    Piotras

    •  Reply
  10. re: Midgard's Multilang Support

    Thu January 01 1970 00:00:00 UTC

    Ah, one more issue. When update method is invoked , core checks if content in langX exists. If yes, it makes update, if not, it creates content.

    I am not sure what should we do when create method is invoked for lang X while content with lang 0 ( or default one ) doesn't exists.

    With lang 0 being as a default one this is a bit complicated as returning error ( create lang 0 content ) should be good idea.

    With lang X set as default one we could always create content and doesn't worry about lang0 ( unknown in this case ).

    Piotras

    •  Reply
1 2 3 next »
Designed by Nemein, hosted by Kafit