Setting up Squid reverse proxy
- Introduction
- Setup options
- Midgard setup changes
- Squid setup
- MidCOM setup
- Testing & Troubleshooting
- Performance testing
- Forcing cache refresh / precaching
Introduction
On high-volume sites significant performance improvements can be attained by setting up a reverse proxy in front of the actual Midgard, Squid is a popular choice.
Caveats
Obviously since we're aggressively caching stuff it will normally take at least the lifetime of an object before it's refreshed from main server and thus your changes to the content will appear with some delay. Also since we're not sending MidCOMs usual 'must-revalidate' headers, some clients (yes, that would be IE) may cache the pages locally for much longer...
Setup options
This document covers a single machine proxy and Midgard setup, Squid makes it possible to make clusters of proxies for even higher performance, see Squid documentation on how to do that.
For our case we have basically two options:
- One (public) IP for Squid and another (public or internal) for Midgard
- This has the advantage of simplifying installation and management as long as the Midgard IP is reachable by the content maintainers.
- It also simplifies Midgard/MidCOM headers debugging
- One (public) IP for Squid and 127.0.0.1 (localhost) for Midgard
- In this case access to the content management becomes troublesome but for staging-live setups this works well.
The third option of having Midgard and Squid in same IP but different ports has a problem with the fact that a lot of places use MIDCOM_NAV_FULLURL in stead of MIDCOM_NAV_RELATIVEURL (and in some cases they don't really have much of a choice), where the nonstandard port is then added to the constructed URLs everywhere, bypassing the cache from first hit onwards...
Midgard setup changes
Set up apache to bind specifically to the chosen IP (here we use 127.0.0.1), basically comment out all Listen and NameVirtualHost directives you can find and then add new ones to /etc/midgard/apache/httpd.conf
Listen 127.0.0.1:80
NameVirtualHost 127.0.0.1:80
Then in each file in /etc/midgard/apache/vhosts change the <VirtualHost *:80> to <VirtualHost 127.0.0.1:80>
Squid setup
Install normally, make sure Squid can resolve the servers IPs to names (edit /etc/hosts if needed). Then set up as reverse proxy by changing the following settings in /etc/squid/squid.conf.
Acceleration options with Squid 2.5
http_port "public-ip-address":80
httpd_accel_host "127.0.0.1"
httpd_accel_port 80
httpd_accel_uses_host_header on
Acceleration options with Squid 2.6
http_port "public-ip-address":80 vhost vport
cache_peer "public-ip-address" parent 80 0 originserver default
Access control options
Then add ACLs for allowing your site to be proxied, here's one example but I recommend reading the Squid documentation on ACLs
acl valid_dst dstdomain .example.com
http_access allow valid_dst (Just before deny all!)
MidCOM setup
in code-init of your host(s) add the following settings:
$GLOBALS['midcom_config_local']['cache_module_content_headers_strategy'] = 'public';
$GLOBALS['midcom_config_local']['cache_module_content_default_lifetime'] = 60;
Adjust the default lifetime as you see fit, this is in seconds (in case the document has no expiry set via metadata then the expiry is set to current time plus the default lifetime, lower values significantly reduce the advantages of proxy)
You can also use 'private' in stead of 'public' in case you need that kind of cache-control, but be aware that currently whenever a user is logged in midcom enters no-cache mode due to uimessages service unconditionally using sessions.
Testing & Troubleshooting
Read Six Things First-Time Squid Administrators Should Know from ONLamp.com.
Remember to restart Squid and Apache after each configuration change, only test one change at a time, keep log of what you changed, where and why.
First test a few times with
lynx --dump --head http://www.yoursite.com/
look for headers indicating cache miss/hit, if you get only misses increase Squid logging and see in the store and squid logs exactly what happens.
If a page is released immediately from the cache there are a few options:
Cache-Controlheaders are incorrect- Use lynx or telnet to examine the headers you get from Midgard
- If you get
Cache-Control: no-store, no-cache, must-revalidateheader then MidCOM is in no-cache mode for some reason.
- If you get
- Use lynx or telnet to examine the headers you get from Midgard
Content-Lenghtheader is incorrect- For some reason in some cases there is extra whitespace sent in MidCOM output, for this reason we try to avoid sending Content-Lenght header when headers_strategy indicates caching.
- See Squid debug level log, it should complain about server sending more data than announced.
Expiresis now or in the past (remember that the header is in GMT, so what might seem like past at first sight perhaps is not...)- Make sure the document does not have expiry set to past (though normally it shouldn't be then available on the site at all).
- Make sure the default_lifetime is sensible.
Performance testing
Get Siege and create urls.txt with one url per line, this should contain enough urls from your site (dump a sitemap or something), then in one window:
tail -F /var/log/squid/access.log
And in another
siege -c 30 -i -t 5m -f urls.txt
And see what happens. Tune Squid as needed, below are some settings I have used:
cache_replacement_policy heap LFUDA
memory_replacement_policy heap GDSF
cache_dir ufs /usr/local/squid/cache 20000 16 256
cache_mem 500MB
maximum_object_size 12000 KB
Forcing cache refresh / precaching
You can use wget to precache your site and/or to force a cache refresh for the whole site with the following command:
wget -t 3 -T 3 --no-cache -q -m www.yoursite.com
For forcing refresh of single page in the cache, go to the page and 'shift-reload' (or whatever works on your browser for forcing it to reload the full page from server).
