Open Source Content Management System

mRFC 0005: MidCOM Next Generation Specification Overview

  1. Problems of the actual MidCOM Generation
    1. Missing Features
    2. Shortcomings of existing features
    3. Purely techinal trouble
    4. Organisational hazards
    5. Documentation
  2. Basic design approach
    1. General page processing sequence
    2. midcom_root.php
    3. Components
    4. Content storage layer
  3. 2DO

IMPORTANT NOTE: The work on this mRFC is frozen for now due to lack of time. I will get back to it later.

Please note, that this text is incomplete yet, please wait with commenting it, until it has been completed and I have removed this note.

This document will outline the basic guidelines, after which the next-generation of the Midgard Components framework will be built. Its main goal is not to provide a complete, finished implementation specification, but only the overview needed to get to this detail in the first place.

This mRFC has been submitted to the Midgard Community for discussion and approval under the Creative Commons Attribution-ShareAlike license.

Problems of the actual MidCOM Generation

MidCOM 1.x has been a great improvement for the Midgard Project (*pattingmyownshoulder*) when it comes down to basic content management and code reusability, but it still fails to meet many moder requirements.

Missing Features

Here we start with a full range of services required in modern day's content management, mainly with Internationalization (vs. Localization), Security Management, Version Control, Built-In Sessioning or Workflows. Most of these things cannot be easily implemented with the current system, as MidCOM does not provide a complete layer between the compoment, the Website output and the Midgard Application server.

In addition, it is still very difficult for the casual user to take the full advantage of the framework. Parts of the system can only be configured by using the source and an administration interface like Asgard, some others even require certain advanced PHP magic where many loopholes for errors exist.

Of course, the last problem mentioned can be taken care of at least partly by more elaborate user interfaces outlined in mRFC 0003: MidCOM AIS User Interface Guidelines. But still the effort required to implement this is quite high, as each component is responsible for doing this itself. Building powerful libraries is difficult, as advanced concept (like inheritance of configuration) has not been planned for MidCOM when it was first specified.

Then, there is no efficient way of personalization, as again the components are not aware of such a service even exists. It is therefore quite difficult to build portal sites. Again, the dynamic load feature, while it can be very powerful, is difficult to use even for me, who has the full insight into the system.

This also leads to problems on sites where you want to easily administer web pages with more then one content area, which has currently to be set up by a skilled programmer, rather then it could be done by a regular user.

I'm not going further into depth here, there is later on in this mRFC enough room to outline what is completly missing.

Shortcomings of existing features

There are several areas of the current code, which are quite troublesome in the production systems I have seen so far:

First, there is the datamanager. While a huge improvement over the traditional form-building that was usually accociated with Midgard, it has too many problems. To start with, there is the requirement of the datamanager to operate on a Midgard Object. This for example leads to the well-known problem of ghost records created during creation of new object, if the user does not click on "cancel" but just uses the browser to navigate elsewhere. Changing this fact in the current implementation will lead to a almost complete rewrite of all datatypes, as the authority to store the data is located there, not at the datamanger core (were it should belong techically). Also, this strict process requires you to do several click edit->click save->click edit again->click save cycles if you want to upload several attachments to a single object. Integration of services like Workflow management and Internationalization is also difficult due to this fact.

The next major problem is the new style engine. The current application's structure leads to the problem, that it does not integrate well with Midgard's existing style engine. While this would be ok if there were major benefits, they arent there. This is something, that could probably easily integrated into the current MidCOM, given the fact that the prerequisits are done. Easy access to the Localization database or more advanced template processing commands (for example a more powerful &(...); command tool) are missing. To implement such features, a new parser should be integrated here, whith the possible integration of text-to-html tools (like the old :F formatter, but more powerful) or html-tidy filters.

One other pain from the beginning was the caching engine currently built into MidCOM. Configuration troubles resulting in incompatible Berkley Database libraries (thanks to a great PHP binding of them...) taken aside, the current implementation is quite simple. It can only cache complete pages, and that only if the application does not object. On any change in the content, the complete cache has to be reset, and changes in the Midgard or MidCOM Style parts go completly unnoticed of the cache, which then has to be invalidated manually. While the cache itself is very fast, with delivery times close to a flat file store, it has proven both instable and insufficiant for larger sites. Caching on an per-element basis with sufficient knowledge to only invalidate what has changed is missing.

Finally, while the NAP system provides a quite flexible way of handling the navigational tasks and basic metadata (creation, editing data) of arbitary objects, there is no generic way of attach other meta information (approval, keywords, etc.) to any content object. This is one thing, that is too easy to implement on a componet side, but the administrative part of it is quite some work.

Purely techinal trouble

Probably the worst thing in MidCOM is the fact, that PHP 4's object handling is just crap. Implementation of new features sometimes take so much work, that you start to wonder why PHP is calling this "Object Orientation" and not "some structures that might look like objects". Poor error handling capabilities (due to missing exception hanlding) are one problem, the reference handling (which requires so much attention by the coder) one other. As MidCOM at least tries to be fully object oriented (and it will stay that way), this is a major drawback here.

Ok, I'm not telling to abandon PHP (or the C-Core at this point). There has been enough discussion why we cannot do this easily or light-heartedly. Also, I will not start one new of those perl-or-whatever is better than PHP holy-wars. Everyone who has interest in this, is advised to search the newsgroup of his favourite programming langues and start bashing at the others there.

What I am propose though is making PHP 5 (with its far more advanced object handling) a mandatory feature for MidCOM 2. Again, this point is not much debatable from my side, as PHP 4 both takes far to much time to adapt to its inefficiencies; especially as just these problems can greatly increase the time you need to implement large-scale frameworks.

Apart from this episode of PHP bashing (sorry, but this had to be said ;-)), performance in general is one huge problem of MidCOM. Since many things have been implemented or reimplemented on the PHP level, MidCOM is slow. A usual MidCOM request can take about 20 times as long, as some old (albeit simple) pure-Midgard Request. While this is perfectly understandable given the extent of the framework, it is not desirable, as long as the cache is that inefficient.

Several points need to be addressed here, where the code-caching by the more powerful Zend engines (can't be done as everything comes from the DB) and a more efficient URL parsing without the huge overhead of many component instantination are the prime ones. Note, that this is a primarily a problem of the website itself, not the administrative site, where the tolerance level is far higher then in the web site part.

Organisational hazards

The final section in this listing covers aspects that are more of an organisational nature that have not much to do with MidCOM in itself.

One of the most annoying things for developers is the fact, that distributed code developement is difficult due to the nature of Repligard. Mostly it requires a central development server were all changes are being made. Merging changes together is, even with the help of the yamp scripts often painful. Having the code back in the filesystem (like Typo3 or other systems do) will greatly ease this problem. At the cost of encaspulation. You can't simply throw a single XML file at the user with everything in it.

What I think of here is integrating some kind of package management into the MidCOM administrative system, that can distribute the code in an apt-get like manner. You would still have an xml file with the MidCOM packaging system for an installation, which would then in turn be able to download the real source it requires from the Net. Again thinking in time saved from us developers I feel that this is well worth it.

This will also greatly reduce the work required to get things like the HTMLArea datatype to work, as you can just put these files into the filesystem and reference to them. With some smart way of Apache configuration it should be easily possible to share MidCOM sources between different installations. Typo3's idea of locally and globally installable plugins is interesting here too.

Documentation

While the current MidCOM core does have some quite extensive API documentation, it is hidden within the snippet tree and is only simple ASCII text with no real formatting (that could be parsed by a Wiki for example). Moving the code from snippets to the filesystem, this should make things a littlebit easier, as tools like PHPDoc can be used for building an API documentation.

While this will solve several issues with us developers hacking around MidCOM, this documentation must also include examples how to use the corresponding functions from outside teh system. These examples should be the building blocks for a more elaborate user-documentation, so that should be something that closes the gap between the guru-docs written by programmers and the end-user docs written by more "normal" guys (SCNR).

Basic design approach

Again, MidCOM will break with many traditions of the old Midgard, so don't be afraid when you have red the next few paragraphs but take it under the following aspect:

The primary goal for MidCOM 2 is to provide an intermediate application layer, that can later be ported into the next genration Midgard Application Server (aka Midgard 2). It will bring new features, and an up-to-date look at CMS tasks. It will not try to be as compatible to almost everything as it could, as this taks a) too much time and will b) interfere with new features here and there. This does especially include MidCOM 1.x, if you want a transition here, it will only be possible by implementing some kind of conversion script, which should be doable, given that there are equivalent components available for MidCOM 2.

MidCOM 2 will try to utilize the elements Midgard already provides on a far greater extent then it currently does, both as this will siginificantly faster and as the new application structure matches just fine with what we already have.

General page processing sequence

The first and major change in application structure will be the decoupling between component invocations and web site pages. This change will introduce MidgardPage objects back into MidCOM. Each page the user sees in the website will be resembled by a MidgardPage, therefore making many things including URL parsing to a given page or NAP processing far easier then it is in the current implementation. You should note, that this will require a complete rewrite of the MidCOM administration system.

The new system will allow you to define more then one content area for a given style, supplementing the <(content)> element by additional <(content1)>, <(content2)> etc. elements. These new content areas are resembled by simple page elements called by Midgard. They in turn relay execution back into the midgard framework. There MidCOM searches for all content objects definied for that particular location. Basically, a content object is a single article, a newsticker, an event calendar or a guest book. But see below for more details.

These content objects are again managed as page elements which will contain the neccessary data to invoce a component.

All these elements (and the page content area) must not be touched by anything except MidCOM, as they will most probably contain the serialized configuration data. (Serialization is used here to keep the performance to a level as high as possible.)

When the control is finally directed back to a component for content object rendering the basic scheme already in use in current MidCOM will be kept. Components have their own default style, which can be overridden by user-defined elements. (Possible changes in the style engine are discussed later.)

midcom_root.php

As you might have guessed from this heading, the next MidCOM generation will bring along its own midcom root file. The reason for this is clear: Currently all MidCOM invocation code is located in a MidgardPage. It is always the same and absolutely required for MidCOM operation. This has the drawback, that it is quite difficult to run a MidCOM website (for instance to set up a MidCOM system) without a greater range of MidgardObjects (page, topics, etc.) in place. Also, of course, it is quite unneccessary to type the same stuff again and again in every MidCOM page.

The new root page will still include the old style <(code-*)> and <(root)> elements, for the sake of compatiblity with non-midcom-applicatios, but they should be definitly avoided on pure MidCOM sites in favor of the new infrastructure to come.

Basically, midcom-root.php will first load the framework (wow!) and then start to invoce the style for the page along with the content areas and their content object. Doing this under MidCOM's control will not only reduce the possibility for errors (never forget to call midcom::finish() to clean up) but also provide advantages like efficient caching of generated page code or the such.

This for example will also be the point where MidCOM could override the built-in Midgard Style engine by preparsing each style element before giving it back to mgd_eval. Also this will allow a transition of the features of MidCOM's style engine into the core without any changes being required outside of midcom_root.php.

Components

The component structure will change slightly, starting with the fact that there will be no further need for a Navigation Access Point (NAP) interface (navigation is seperated from the content now, see above).

All components will no longer manage content on a per-topic ("directory") base, but only singular object on a given page, which will take away URL handling tasks from many simpler components. It will still be possible to have "subpages" for example for newsticker detail pages (details will follow below).

Administration is kept like it is currently, having an administration system similiar to the one we have now.

Content storage layer

Building on the idea of the datamanager introduced by MidCOM 1.x, MidCOM 2 will create an advanced service from the basic idea. Generally speaking, the components no longer work directly on any topics or articles, but instead the work through a content object management layer. They provide an schema definition, like they do currently, and request either a single object or a list of objects of the corresponding schema.

Each concrete instance of a component (a "content object") will automatically receive a list of data objects available to that specific content object. First, this will make the movement of content objects from one page to another quite easy. Secondly, and perhaps most importantly, this will allow to move the data storage to a more advanced form when Midgard 2 finally arrives.

2DO

  • Content Objects

Back

Designed by Nemein, hosted by Anykey