Open Source Content Management Framework

Setting up Squid reverse proxy

  1. Introduction
    1. Caveats
  2. Setup options
  3. Midgard setup changes
  4. Squid setup
    1. Acceleration options with Squid 2.5
    2. Acceleration options with Squid 2.6
    3. Access control options
  5. MidCOM setup
  6. Testing & Troubleshooting
  7. Performance testing
  8. Forcing cache refresh / precaching

Introduction

On high-volume sites significant performance improvements can be attained by setting up a reverse proxy in front of the actual Midgard, Squid is a popular choice.

Caveats

Obviously since we're aggressively caching stuff it will normally take at least the lifetime of an object before it's refreshed from main server and thus your changes to the content will appear with some delay. Also since we're not sending MidCOMs usual 'must-revalidate' headers, some clients (yes, that would be IE) may cache the pages locally for much longer...

Setup options

This document covers a single machine proxy and Midgard setup, Squid makes it possible to make clusters of proxies for even higher performance, see Squid documentation on how to do that.

For our case we have basically two options:

  1. One (public) IP for Squid and another (public or internal) for Midgard
    • This has the advantage of simplifying installation and management as long as the Midgard IP is reachable by the content maintainers.
    • It also simplifies Midgard/MidCOM headers debugging
  2. One (public) IP for Squid and 127.0.0.1 (localhost) for Midgard
    • In this case access to the content management becomes troublesome but for staging-live setups this works well.

The third option of having Midgard and Squid in same IP but different ports has a problem with the fact that a lot of places use MIDCOM_NAV_FULLURL in stead of MIDCOM_NAV_RELATIVEURL (and in some cases they don't really have much of a choice), where the nonstandard port is then added to the constructed URLs everywhere, bypassing the cache from first hit onwards...

Midgard setup changes

Set up apache to bind specifically to the chosen IP (here we use 127.0.0.1), basically comment out all Listen and NameVirtualHost directives you can find and then add new ones to /etc/midgard/apache/httpd.conf

Listen 127.0.0.1:80
NameVirtualHost 127.0.0.1:80

Then in each file in /etc/midgard/apache/vhosts change the <VirtualHost *:80> to <VirtualHost 127.0.0.1:80>

Squid setup

Install normally, make sure Squid can resolve the servers IPs to names (edit /etc/hosts if needed). Then set up as reverse proxy by changing the following settings in /etc/squid/squid.conf.

Acceleration options with Squid 2.5

http_port "public-ip-address":80 
httpd_accel_host "127.0.0.1"
httpd_accel_port 80
httpd_accel_uses_host_header on

Acceleration options with Squid 2.6

http_port "public-ip-address":80 vhost vport
cache_peer "public-ip-address"    parent    80  0  originserver default

Access control options

Then add ACLs for allowing your site to be proxied, here's one example but I recommend reading the Squid documentation on ACLs

acl valid_dst dstdomain .example.com
http_access allow valid_dst (Just before deny all!)

MidCOM setup

in code-init of your host(s) add the following settings:

$GLOBALS['midcom_config_local']['cache_module_content_headers_strategy'] = 'public';
$GLOBALS['midcom_config_local']['cache_module_content_default_lifetime'] = 60;

Adjust the default lifetime as you see fit, this is in seconds (in case the document has no expiry set via metadata then the expiry is set to current time plus the default lifetime, lower values significantly reduce the advantages of proxy)

You can also use 'private' in stead of 'public' in case you need that kind of cache-control, but be aware that currently whenever a user is logged in midcom enters no-cache mode due to uimessages service unconditionally using sessions.

Testing & Troubleshooting

Read Six Things First-Time Squid Administrators Should Know from ONLamp.com.

Remember to restart Squid and Apache after each configuration change, only test one change at a time, keep log of what you changed, where and why.

First test a few times with

lynx --dump --head http://www.yoursite.com/

look for headers indicating cache miss/hit, if you get only misses increase Squid logging and see in the store and squid logs exactly what happens.

If a page is released immediately from the cache there are a few options:

  1. Cache-Control headers are incorrect
    • Use lynx or telnet to examine the headers you get from Midgard
      • If you get Cache-Control: no-store, no-cache, must-revalidate header then MidCOM is in no-cache mode for some reason.
  2. Content-Lenght header is incorrect
    • For some reason in some cases there is extra whitespace sent in MidCOM output, for this reason we try to avoid sending Content-Lenght header when headers_strategy indicates caching.
    • See Squid debug level log, it should complain about server sending more data than announced.
  3. Expires is now or in the past (remember that the header is in GMT, so what might seem like past at first sight perhaps is not...)
    • Make sure the document does not have expiry set to past (though normally it shouldn't be then available on the site at all).
    • Make sure the default_lifetime is sensible.

Performance testing

Get Siege and create urls.txt with one url per line, this should contain enough urls from your site (dump a sitemap or something), then in one window:

tail -F /var/log/squid/access.log

And in another

siege -c 30 -i -t 5m -f urls.txt

And see what happens. Tune Squid as needed, below are some settings I have used:

cache_replacement_policy heap LFUDA
memory_replacement_policy heap GDSF
cache_dir ufs /usr/local/squid/cache 20000 16 256
cache_mem 500MB
maximum_object_size 12000 KB

Forcing cache refresh / precaching

You can use wget to precache your site and/or to force a cache refresh for the whole site with the following command:

wget -t 3 -T 3 --no-cache -q -m www.yoursite.com

For forcing refresh of single page in the cache, go to the page and 'shift-reload' (or whatever works on your browser for forcing it to reload the full page from server).

Designed by Nemein, hosted by Kafit