Open Source Content Management System

mRFC 0006: MidCOM database-to-filesystem transition roadmap

  1. Introduction
  2. Current problems
  3. New package structure
    1. Permissions
  4. Roadmap
  5. Compatibiltiy notes
  6. Implementation notes
  7. Closing notes
    1. Personal notes
    2. Ideas for the future that came to me while writing the mRFC
    3. Links for writing secure PHP code (curtesy of Tarjei Huse)

This document will outline the basic guidelines, after which the current development strain of the Midgard Components framework will be moved into the filesystem.

This mRFC has been submitted to the Midgard Community for discussion and approval under the Creative Commons Attribution-ShareAlike license.

Introduction

With the increasing size of the MidCOM Framework, I'm loosing more and more time due to the inefficiencies of the Midgard Database driven way of development. While this is great for the actual sites that need replication, it is a great trouble to development. As this makes the continuing development of MidCOM rather troublesome, I have decided to put one point of the MidCOM 2 plans in mRFC 0005 to work first, freezing all other MidCOM 2 efforts for now, which is the transition of the MidCOM code out of the database into the filesystem. This mRFC will outline the basic reasons, the new package structure and the transition roadmap.

Current problems

While Repligard is a great tool for replication of data, it is hell for development, we all know that already, so I won't go into the gory details here. Just for a quick summary what you have to live with:

  • Editing only in the Web-Interface or PHP Mole.
  • GUIDs make distributed development tricky, as for example the root snippetdirs (like de, com, net ...) have to be really unique.
  • CVS operation is hell, as the changelogs are unusable most of the time. This makes merging CVS commits from one branch to another almost impossible.
  • Source distribution is of average ease. You cannot, for example, share the same MidCOM source over several databases.
  • Possibly severe performance disadvantages to the repeadet sql queries neccessary to query all the stuff we currently have in Snippets. Also, this fact makes the life for PHP optimiziers (which usually cache php files, not eval'd code) difficult.
  • Integration of Third Party applications like HTML Area is just plain hell, as you almost always have to patch them so that they work from within a Midgard DB. Also, browser-side caching of .js or .css files is difficult.
  • Repligard does not handle attachments/parameters to snippet(dir)s cleanly. They all have to be explicitly merged into the repligard export configuration, which is error prone and time intensive.
  • Documentation, currently contained in the Snippet Documentation fields is not easily accessible.
  • Tracking of error messages (you know, the "error in eval'd code in line xx in evald code line yy ..." stuff)
  • Debugging using PHP debuggers, they have similar problems like the PHP optimizers

Summarizing, I think we loose far more with the db-driven way in this case, then we will ever gain from the benefits.

New package structure

The new MidCOM will basically be equal to what we have now in the snippetdir tree. I will keep the naming hierarchy we have now, so the directory structure will be roughly like this:

/midcom
/midcom/midcom
/midcom/midcom/helper
/midcom/midcom/admin
/midcom/midcom/...
/midcom/de
/midcom/de/linkm
/midcom/de/linkm/taviewer
/midcom/de/linkm/newsticker
/midcom/de/linkm/...
/midcom/de/...
/midcom/...

Basically each directory is equal to a snippetdir. Snippet code will be php files named after the snippets. This will make a snippetdir path we have now as close as possible to the paths we have right now. Note the fact that the whole stuff will be collected in a single subdirectory so that you can easily activate it with an Apache Alias directive.

Changes will be when it comes down to the existing component structure. For a start, it will be no longer allowed to nest components, so in the above example there must not be a component de.linkm.taviewer.events. In addition, to ease work, the _code and _midcom snippetdirs as they are now will be removed, moving all source files into the main directory, where the MidCOM interface will be in a defined file (most probably midcom.php).

Basically a new component will look like this:

.../component.xml
.../midcom.php
.../admin.php
.../viewer.php
.../nav.php
.../myhelperclass.php
.../config/config.php
.../config/schema.php
.../documentation/index.html
.../documentation/otherfile.html
.../midcom/midcom_service_l10n_strings.db
.../style/style-init.php
.../style/mynav.png
.../style/show-article.php
.../locale/...

The file component.xml will be a short declarator that identifies the component and its properties. This will make it easy for the system to search for components. Call this a deployment descriptor.

The names of the code files should be identical to their classnames, so that the class de_linkm_newsticker_viewer is in the file /de/linkm/newsticker/viewer.php. Code can be anywhere in the root directory or an arbitary subdirectory of the component not equal to one of the reserved directory names.

Configuration data stays in the config subfolder and still resembles component defaults for easy default operation.

Documentation is now two-fold. First, all classes must be docuemted in-line with a yet-to-be decided documentation tool (like PHPdoc). This will ensure an easily maintainable (and complete) API documentation. Everybody that does not honor this tool, will be shot on sight ;-). User-Level documentation, like quick How-To's or information about the lookalike of a given configuration file must go into the documentation subfolder, that will be made available through MidCOM.

A folder named midcom will still be there to be used by the framework's tools (like the l10n service in the above example) to store data relevant for the component, but not managed by it. The names of the files in there have to match the component names for namespacing purposes of course.

Any style elements will be created like they are now, with the difference, that you can now place images easily into these directories. The new style engine will be able to work with this (somehow, that is *g*). This is mainly geared for things like HTMLArea or the like, that need a bunch of images the user normally doesn't override.

Finally, the locale directory will hold the L10N data for the component, most probably managed by a more powerful tool like gettext (for performance reasons). The current l10n tree should be easy to translate into something gettext driven.

Note, that, as with the current situation, the same structure will be used both for real components and for libraries. They will be distinguished by MidCOM using the capabilities that can be queried using the compontent interfaces.

Permissions

The interesting part here are permissions. At the moment, there is no need for write permissions inside the actual library in any way. So having 711 for directories and 644 for files with root/root or other appropriate ownership should be sufficient.

An exception will most probably be the development servers, which will need write acces for stuff like updating the L10N databases.

Roadmap

  1. The transition will start immediately after the release of MidCOM 1.4.0, at which time the complete MidCOM development will be frozen until the change has been done. The 1.4 stable strain will only be maintained for severe bugs that are not easy to work-around.
  2. Start moving the source of the core into the filesystem, doing some minor cleanups while doing this. Merge all documentation into the source files during this. This will include moving all attachments from things like HTMLArea into the filesystem. Exactly one component, preferrably de.linkm.taviewer will also be moved during this process to have one proof-of-concept component available. (Note: This might be used to rewrite / clean up large parts of this component, which is showing its age already.)
  3. During this transition, some of the existing subsystems will have to be upgraded, notably the l10n subsystem, which should now start to employ some other library, probably GNU gettext or the like.
  4. After the basic transition is done and the reference component implementation is there, immediately release a new MidCOM development version. This version should have a mostly (say 95%) stabilized API and is recommended for component authors to start rewriting their components. This release will be entitled MidCOM 1.9.x and will have new releases on a bi-weekly basis, if possible, incorporating last minute bugfixes.
  5. Most of the major components currently in use have to be converted for the final release. After my judgement this should include at least:
    • de.linkm.collector
    • de.linkm.events
    • de.linkm.newsticker
    • de.linkm.taviewer
    • de.linkm.sitemap
    • net.nemein.discussion
    • net.nemein.downloads
    • net.nemein.rss
  6. After having all major components available again, a release candidate will be created, which will be open for public testing and which should already be usable on minor production systems.
  7. About four to six weeks after the RC, if there are no critical bugs outstanding, the final release, entitled MidCOM 2.0 will be done, after which the feature freeze will be stopped and normal development should resume.

I have not yet an idea how long this will take, it will mostly depend on the number of changes neccessary in the core.

Compatibiltiy notes

  • There will be no technology whatsoever to enable MidCOM to run both from the Filesystem and from a Midgard Database. This is simply too much work and will again induce the disadvantages outlined above.
  • Exisiting Websites therefore will need some minor adaptions to get the new MidCOM running. Especially since you will have to replace one mgd_include_snippet directive or another. Should be overseeable though.
  • Components will have to undergo minor changes to adapt both to the new package structure and to a few changes in the MidCOM internals that this change will introduce.
  • Note, that this might be the time to introduce some midcom_root.php file to replace the current midgard_root.php one.
  • Depending on the structure of the deployment descriptor, it could be possible to move configuration defaults out into /etc of the Linux System, especially intersting for Linux package maintainers.

Implementation notes

  • A site-local working directory could be needed where things like the Site cache are located. This directory must not be servable to the client directly, but only readable by MidCOM, which will post-process its contents.
  • Depending on what several working things (like the l10n stuff) show, it might be neccessary to introduce additional dependencies into MidCOM (like BerkleyDB or some XML library).
  • The so-called pure code libraries will still be there and should be used to introduce third-party systems into MidCOM in a controlled way. For example there could be an ImageMagick component etc.
  • The Deployment descriptor should have an MidCOM versioning available from the start to ease transition in case of changes in the MidCOM compnent interface.
  • The new Files in the package should be reworked into a PEAR compatible way as far as possible. Especially there should be support for the PEAR installer as early as possible. Further on, the code should be started to use the PEAR coding standards from now on. Due to the code structure of MidCOM it will be mostly impossible to get MidCOM itself into PEAR, therefore the only really important part here for now is the installer support.

Closing notes

I personally think that this transition will solve a lot of the problems currently making development with and of MidCOM difficult. Also it will more efficient back-porting of features into old, stable versions without the trouble currently acciociated with it. Many error-prone places in MidCOM will be cast away, both on a developer and on an administrator's side.

As you have seen, I have currently frozen my design work for MidCOM 2.0 in favor of this project. I did this because I think that having this technology available will have a far greater benefit for the future development, enabling us to merge new changes into the system more easily.

Personal notes

I strongly suggest to vote +1 for this mRFC, both as a MidCOM component author and as the MidCOM core lead developer. Let us make it again a bit easier.

Oh, and, frankly speaking, I don't want to hear of things like FileSync, which are not much better then Repligard is, solving only about a third of the problems mentioned above, but introducing new dependencies in Midgard driven code. It simply doesn't help you where it counts.

Ideas for the future that came to me while writing the mRFC

Note, that this is not part of the voting, it is just some thoughts I had that need to be investigated after the transition. Most of these things call for their own mRFC though.

  • Enhance the deployment descriptor deprecating midcom.php; this is taken from the Java world where you are saved from writing the same interface code again and again. Instead you say, for example, my NAP class is called this-and-that and the J2EE server will automatically create an interface class on itself. Nice thing, enables you to do some declarative programming. ;-)
  • There might also be a site-local data direcotry used for component specific data. Together with the former point this will bring the problem of where to allow write access for the web server, which is not trivial to solve.
  • Packages of a given component for a Linux distribution will be easy to make now, as you only have filesystem data to distribute.
  • Depending on the requirements on local filesystem permissions, it would be an idea to move all this stuff into PEAR in the future.

Links for writing secure PHP code (curtesy of Tarjei Huse)

Back

Designed by Nemein, hosted by Anykey