Open Source Content Management Framework

Caching Midgard requests

  1. Caching Midgard requests

    Thu August 28 2008 12:48:28 UTC
    Hi!

    I made simple performance tests for something which is known as
    midgard_request_config in Midgard2.
    Basically it's replacement for $_MIDGARD and core's (not propagated on
    PHP level) request_config.
    On PHP level it's simple array which holds:

    * midgard_host object
    * current midgard_page object
    * all tree pages strating from root one to current one
    * midgard_style object
    * argc and argv[]

    I know these are fetched on MidCOM level, so request_config gives them
    "for free".
    Additionally, pages' array is ready to use kind of breadcrumb. Just
    iterate over array and do what you need.

    In Midgard2 we need to follow these steps to create request_config:

    1. Parse url and tokenize it
    2. Fetch host record with QB
    3. Fetch root page (QB)
    4. Try to fetch all pages by their name ( QB)
    5. Page is not found, define argc number and create argv array
    6. Create request config object
    7. Propagate it as PHP one

    So, what we need to cache is everything from 1 to 4, all SQL queries.
    And initial idea looks like this:

    1. Get url and do lookup to find *the same* in cache
    2. If not found perform steps uncached (1-4 as above), and if found, get
    request config from cache
    3. Clone it
    4. Propagate as PHP object

    Let me know, if you really need to know why we need to clone it.

    # TESTS:

    ## Memory

    I thought that in most cases ( as an average) we should have about 4
    objects held by midgard_request_config:
    1 host, 1 style, 2 pages.

    To allocate up to 50MB of memory, we can hold permanently about 5.000
    different urls. Of course it might be configurable for sites which have
    enough memory and need more speed. But you know the number at least.

    ## Performance

    I did it for 30.000 unique urls with pattern:
    www.mysite.com/pageA/pageB/pageC/pageD/argv1/argv2/29999.
    I did lookup for first url in cache and the last one. And fetched four
    objects using QB.
    ( Keep in mind that in tests we use microseconds, not miliseconds )

    www.mysite.com/pageA/pageB/pageC/pageD/argv1/argv2/29999
    Time 0.012 miliseconds

    www.mysite.com/pageA/pageB/pageC/pageD/argv1/argv2/1
    Time 0.865 miliseconds

    Get Objects
    Time 4.715 miliseconds ( 0.004715 sec )

    When I turn on MySQL cache, fetching objects require about 3.000
    miliseconds, so number is still high comparing to cache lookups. So in
    the slowest part of the cache we fetch requests 4x faster.

    # Issues

    Problem is what to cache exactly. Request or page? In first case we need
    to hold cache entries for the same page many times in cases when page
    uses many argv. In latter one, we can tokenize url and do cache lookups
    as long as we find page's url ( without argv ) in cache, but this may be
    slower than SQL queries if page's cache entry is not at the beginning of
    the cache and uses many argv.

    What do you think?

    Piotras
    _______________________________________________
    dev mailing list
    dev@lists.midgard-project.org
    http://lists.midgard-project.org/mailman/listinfo/dev
    •  Reply
  2. Re: [midgard-dev] Caching Midgard requests

    Fri August 29 2008 08:49:35 UTC
    Hi,

    As this all started from the benchmarks I ran after hearing Rasmus
    Lerdorf's talk (http://www.sitepoint.com/blogs/2008/08/29/rasmus-lerdorf-php-frameworks-think-again/),
    I guess I should answer :-)

    On Thu, Aug 28, 2008 at 3:48 PM, Piotr Pokora <piotrek.pokora@gmail.com> wrote:
    > I know these are fetched on MidCOM level, so request_config gives them
    > "for free".

    Yes. When MidCOM3 is run with Midgard 1.x, it fetches all this from
    database, but the Midgard 2 dispatcher of MidCOM3 gets them from
    midgard_request_config.

    http://github.com/bergie/midcom/tree/master/midcom_core/services/dispatcher/midgard2.php

    > 1. Get url and do lookup to find *the same* in cache
    > 2. If not found perform steps uncached (1-4 as above), and if found, get
    > request config from cache
    > 3. Clone it
    > 4. Propagate as PHP object

    I imagine this will make big improvements.

    Before I can test this on my benchmarks, it would be interesting to
    hear differences with "siege -c 5 -t 30s" on cached vs uncached
    midgard_request_config.

    As my benchmarks showed, Midgard2 + MidCOM3 performs stellarly on 10
    second sieges, but the performance drops at 30 sec runs. As I believe
    MySQL connection clogging is the main cause here, caching the request
    data should make a huge difference.

    > Problem is what to cache exactly. Request or page?

    What I would do is have two caches:

    * URL-to-page mapping cache
    * Page-to-midgard_request_config cache

    First you match a given URL to its page, then get the page's
    midgard_request_config from cache, change dynamic ARGs as needed
    (based on difference of URL and page URL), and then pass it on to
    PHP...

    That way midgard_request_config wouldn't need to be stored for each
    URL, but only for each page. It is quite a big difference, as I
    believe a typical MidCOM site can easily have thousands of URLs
    (articles, their different variants, whatever), but only a few dozen
    pages.

    > Piotras

    /Bergie

    --
    Henri Bergius
    Motorcycle Adventures and Free Software
    http://bergie.iki.fi/

    Skype: henribergius
    Jabber: henri.bergius@gmail.com
    Jaiku: http://bergie.jaiku.com/
    _______________________________________________
    dev mailing list
    dev@lists.midgard-project.org
    http://lists.midgard-project.org/mailman/listinfo/dev
    •  Reply
  3. Re: [midgard-dev] Caching Midgard requests

    Fri August 29 2008 09:22:45 UTC
    Henri Bergius writes:
    > Hi,

    Hi!

    > Before I can test this on my benchmarks, it would be interesting to
    > hear differences with "siege -c 5 -t 30s" on cached vs uncached
    > midgard_request_config.

    All my tests were made with command line simple program, which just
    measured time.
    MySQL server has no other external or internall connections, so in real
    life the time I posted should be much longer for getting objects from DB.

    I started simple tests with self designed structures as those used in
    tests were implemented with GLib API.
    Task is quite hard as I decided we need to limit slower case to 2
    miliseconds ( 2000 microseconds).
    During this time I need to find url/page, resort cache and recreate it.
    Simply we need to have full control
    over "queued" cache entries.
    Looks like it's possible for up to 20.000 cache entries.

    Another thing is that implementation should be tied to midgard-php
    extension, not to the core.
    At least for now, as making core API for this will require special
    design for derived hooks.

    >> Problem is what to cache exactly. Request or page?
    >
    > What I would do is have two caches:
    >
    > * URL-to-page mapping cache
    > * Page-to-midgard_request_config cache

    That should be also doable. Of course, real cases will show where's the
    cache limit.

    Piotras
    _______________________________________________
    dev mailing list
    dev@lists.midgard-project.org
    http://lists.midgard-project.org/mailman/listinfo/dev
    •  Reply
Designed by Nemein, hosted by Anykey