Open Source Content Management Framework

mRFC 0030: Midgard object replication

  1. Revision history
  2. Background
      1. Example: exporting all newest objects example.
  3. Midgard object metadata
  4. Midgard object methods
    1. Create
    2. Update
    3. Import
    4. Export
      1. Example: exporting visible midgard_article records
      2. Delete
      3. Undelete
      4. Purge
    5. Repligard table
      1. Query deleted objects
    6. Staging/live in MidCOM
      1. Invoking replication
      2. Export process
      3. Subscribers and replication queue
        1. Subscription types
      4. Import process

Revision history

  • 2006-06-02 Created by Piotr Pokora
  • 2006-06-17 Updated: possibility to delete and undelete objects
  • 2006-06-19 (torben) Commented the MidCOM section, reset the mRFC to draft status due to this.

Background

This mRFC is a proposal for repligard functionality replacement , supported by midgard-core and by any language for which Midgard language bindings exist. As repligard supported Midgard database replication without any possibility to change its behaviour a lot , this mRFC focuses on object's records replication , object exporting and object importing to or from any database storage. This mRFC also describes possibility to undelete and purge object's record(s).

Examples in this mRFC use PHP as scripting language.

Example: exporting all newest objects example.

<?php  

$exported_objects = array();
foreach ($_MIDGARD['types'] as $type => $type_id) {
    $qb = new midgardquerybuilder($type);
    $qb->add_constraint("metadata.created", ">", $yesterday); 
$retval = $qb->execute();
    $exported_objects = array_merge($exported_objects, $retval);
}

foreach($exported_objects as $object){
    $object->export();
}   
?>

Midgard object metadata

Every Midgard object's metadata class should have 'exported' and 'imported' members assigned as properties of midgard_metadata type. These two properties's values represented by corresponding database storage values should be settable only by midgard-core with particular methods. Both properties must be MGD_TYPE_DATETIME type.

Midgard object methods

Midgard-core must support export and import methods for object's replication. Object's replication related metadata should be managed by basic methods: create, update, delete, undelete , purge, import, export.

Create

  • created , current datetime value must be set
  • revised, current datetime value should be set
  • imported, empty value must be set
  • exported, empty value must be set

Update

  • created, value can not be set
  • revised, current datetime value must be set
  • imported, empty value must be set
  • exported, empty value must be set

Import

  • created, value can not be set if object's record exists in database in any other case object's record must be created and metadata created with current datetime value must be set
  • revised, value shouldn't be set if object's record doesn't exists in database in any other case metadata revised must be set with value of imported object's metadata revised value
  • imported, current datetime value should be set
  • exported, empty value must be set

Object can be imported to database only if imported metadata property value of the object for which record in database exists is empty and its metadata revised property value is not newer than the value of the same property of object which is to be imported.

Export

  • created, value can not be set
  • revised, value can not be set
  • imported, value can not be set
  • exported, current datetime value must be set

Object can be exported from database without any restriction. Application which exports objects defines what constraints are used to export object records.

Example: exporting visible midgard_article records

This example demonstrates how to export newest and not hidden midgard_article objects

<?php

$qb = new midgardquerybuilder("midgard_article");
$qb->add_constraint("metadata.created", ">", $yesterday);
$qb->add_constraint("metadata.hidden", "=", FALSE);

/* Do not export objects which were created and exported for the last 24 hours */
$qb->add_constraint("metadata.exported", "=", "");

$retval = $qb->execute();

foreach($retval as $object){
    $object->export();
}   
?>

Delete

  • created, value can not be set
  • revised, current datetime value must be set
  • imported, value can not be set
  • exported, value can not be set
  • metadata.deleted, value must be set.

Object's record(s) can not be deleted from database when delete method is invoked. Instead , object's metadata delete property should be explicitly updated with correct delete value. For performance reason this value should be an integer type. This metadata property should be also used by midgard core as mandatory Midgard Query Builder's constraint added internally by Query Builder implementation.

Undelete

  • created, value can not be set
  • revised, current datetime value must be set
  • imported, value can not be set
  • exported, value can not be set
  • metadata.deleted, value must be (re)set to initial default state.

Purge

When this method is invoked, midgard core should delete object's record(s) from database and should update repligard table. Following values should be set for corresponding record in repligard table:

  • object's class name
  • object's guid
  • purge action should be set to TRUE value

Repligard table

Repligard table must containt only object's guid and typename ( classname ) for which new object instance with particular guid can be created. Additionaly repligard table should contain information if object's record(s) was purged.

Query deleted objects

Midgard core should implement functionality which allows to query only deleted objects. This functionality could be implemented with new Midgard Query Builder method or with new Midgard type.

Staging/live in MidCOM

Missing topics: Data model, general service architecture, including the layers of the communication infrastucture.

Torben

Since replication interfaces are exposed to PHP level, the staging/live process can be handled in MidCOM space. This should make the simple replication scenarios like an article that is approved much faster, the system more fault tolerant due to the queue system, and finally the replication logic easier to tweak because it is entirely in PHP level.

There will be a midcom.helper.staginglive purecode component to handle the replication process.

This will have to be midcom.services.replication.

Torben

Invoking replication

The replication component will register UPDATE and DELETE watchers for midcom_core_dbobject. Since approvals are stored into Midgard metadata fields using update() method approval should be carried to this watcher.

Note, that you cannot register watches for midcom_core_dbobject, as this class is not a base class for other DBA classes. It is more of a "mix-in" due to Limitations of the PHP OOP API. The correct definition would be adding watches to all defined DBA classes by not limiting the watch to any class.

Torben

To catch objects updated or deleted outside MidCOM DBA the system will also provide a cron entry that will query through all MgdSchema types to see if there is something to replicate. This can be run relatively seldom, for example once per day.

This is not very precise. The criteria after which objects are marked for replication should be outlined.

Torben

Export process

The export watchers will execute an exporting method in the midcom.helper.staginglive interface class and provide it with the affected object as argument.

If approvals or scheduling are being used, the system will first check if the object itself is set to be visible. Otherwise visibility will be assumed.

If the object is visible, the same check will be done for its parent.

Note, that this can prohibit replication under certain circumstances: Assume that a topic and one of its articles are both changed. The article gets approved, the topic not. Replication of the article will then be delayed.

Note as well, that there is another case, where you actually need to be careful if you change the behavoir to cover the above special case: If the topic in question was newly created, it won't be available on the target server.

Normally, the latter should be no problem if and only if only GUIDs are used for linking and, as such, ID mappings are not required. The legacy data structures do not yet do this. I very strongly suggest that this will be changed prior implementing this, even if this means that existing code (mainly QB lookups) have to be adapted to it.

Torben

If the parent is visible, the object will be marked for exporting. If the parent is not visible export is aborted and UI message "Object not exported because parent is not visible" sent to the user.

What happens later, when the parent gets cleared? Is the delayed object queued somewhere? This needs much more detail.

Torben

Then the method will query the object's children and call the same method for them.

I assume that this is the fix for the above problem I mentioned. The problem I see here is that you will often get large trees here, especially if you are at the top of the tree. Also you will have difficulties as you have to know all types which potentially have the object in question as a parent:

Especially topics are tricky here, as several of the components I wrote lately assign their data to topics, without actually using midgard_article. But groups or persons can easily reach a similar spread.

Torben

TODO: Pseudocode for the exporting decision

I'd like to see this before +1'ing anything.

Torben

Finally the exported objects will be stored into replication queue and an at entry registered for running the replication next minute.

Is this wise? Or would it be more appropriate to have a cron job doing it every X minutes to optimize / batch the changes.

Torben

UI message "3 articles and 1 topic exported for replication" is sent to the user.

Subscribers and replication queue

Subscribers (Midgard sites to replicate to) are stored in the database. The subscriber entry contains the replication method used for that subscriber and address and authentication information required for the method.

When objects are exported, the export XML is stored into a midcom_helper_staginglive_queue_entry object for each subscriber.

Data duplication, the data model should be revised here.

Torben

Replication is launched by the at entry registered at export phase. The replication process goes through all subscribers, and tries to send the items of their queue to them using the defined method.

There can be multiple replication methods supported by the system. At first we will only implement a simple HTTP PUT (or POST) method sent to a MidCOM exec handler of the midcom.helper.staginglive component, authenticated using HTTP Basic Auth. Other methods could include Jabber/XMPP, DBE or even email.

Wouldn't some XML-RPC or other standardized remoting mechanism be more appropriate then some proprietary solution?

Torben

Email could provide a high security replication solution where the exporting end would GPG encrypt and sign the email, then send it out as an email. The importing end would then read the email from the server, check its signature, decrypt it and import. This way the importing end could even be a mostly offline computer that would only connect to internet to receive the emails once per day (this is how FSF Europe's member registry works to protect privacy).

Consider using regular X.509 based encryption / signing instead of PGP. More standardized (almost as easy to use) and integrated into existing PKI infrastructures. This should be available completly independant of the actual transport layer used, btw.

Torben

If the replication is successful, the queue entry is deleted. If not, the replication will be attempted again every hour using a cron entry.

Provide some fallback mechanism, similar to the SMTP system. There should be a fixed number of retries, maybe with increased waiting periods after certain escalation levels.

Also, it is recommended to flag entries as failed after a certain amount of retries have been reached.

Failed connections to a subscriber should escalate as well, disabling the subscriber until the situation is resolved. Otherwise, there would be an easy DOS scenario when too many replications pile up for a given subscriber.

Torben

Subscription types

In the initial implementation every object is replicated to every subscriber. At a later stage we can implement different subscription type handlers that will enable limiting the subscription to particular MgdSchema types, object trees or even particular query constraints.

Examples of subscription types would be "content in tree", "my tasks" or "all forum posts".

This is rather vague, especially since the neccessary basic infrastructure to provide something like this is never mentioned above. Goes into the chapter "missing service architecture".

Torben

Import process

Import process will simply call the Midgard import method for each object in the imported XML file.

No sanity checks, nothing? I think there has to be a high level of bulletproofing when letting such replications in.

Torben

Back

Designed by Nemein, hosted by Kafit