==Ultimate cheapskate Load-balancing and failover with "Pen" and "FreeVRRP"==
By Nathan Butcher 2006/12/6
Pen can efficiently load balance connections between two servers, this is preferable to DNS round-robin
because the connections are evenly balaced across both servers (unlike DNS round-robin which tends to
favour one server over another generally). Pen can also be combined with FreeVRRP to enable load-balancing
with failover.
Pen and FreeVRRP allow us to create a "Cheapskate Cluster", and save ourselves a lot of money on expensive
loadbalancing/failover solutions and hardware (ahem, F5 networks). Also, our web servers won't require two NICs.
Finally, everything can be managed across a minimum of two single-homed hosts (which is what I'll be doing).
Also understand that "Pen" is no good at loadbalancing MX records for mail servers. Mail servers usually
need to check IP addresses of incoming mails (for preventing open relays and spam), and once you put pen in
the mix all incoming mails are going to look like they originated locally. Stick to DNS round-robin instead.
The outline of the cheapskate cluster is detailed here on [http://siag.nu/pen/vrrpd-linux2.shtml The Ultimate Cheapskate Cluster]
page, but the actual configuration is as follows. It turns out that we don't need aliased IP addresses
for pen as suggested in that document. Our FreeVRRP virtual IP address solves that problem:-
'''IP address layout'''
We require a range of 3 IP addresses (minimum) to make this configuration. Note that only 1 IP address
(The FreeVRRP virtual IP) needs to be remotely accessible for service. The other IP addresses should be
blocked off in our DMZ firewall (except port 22 for SSH and maintenance). For this example, here is how
the IP addresses have been designated.
'''192.168.0.1:8180''' - Server #1 Apache instance
'''192.168.0.1:80''' - Server #1 Pen instance
'''192.168.0.2:8180''' - Server #2 Apache instance
'''192.168.0.2:80''' - Server #2 Pen instance
'''192.168.0.3:80''' - FreeVRRP virtual IP for failover and public access, which Pen will also listen on.
"192.168.0.10:10000" - Our log server
'''Apache/Tomcat settings'''
The apache configuration is very simple. All we have to do is make it listen on the server's dedicated IP at
a specified port.
The configuration on FreeBSD is at ''/usr/local/etc/rc.d/apache/httpd.conf''. Of course, pen and apache
cannot share the same port number, and since pen will most likely be listening to port 80, you should choose
another port for apache. For this example, we'll use 8180, but you can choose anything unused.
''Server#1''
Listen 192.168.0.1:8180
''Server#2''
Listen 192.168.0.2:8180
Our system actually uses Tomcat on port 8180, so the configuration for the server is in ''/usr/local/tomcat5.5/conf''
It is simple enough to edit ''/usr/local/tomcat5.5/conf/server.xml'' and uncomment the section on
HTTP (and HTTPS) and set the ports to listen on in those sections.
'''FreeVRRP'''
With FreeVRRP we give our servers failover capabilite. By creating a virtual IP here, we can ensure continuous
service to our webservers, even if one goes down. The IP addresses we set here will be ones that are open
to the internet.
FreeVRRP waits for a configuration file at '''/usr/local/etc/freevrrpd.conf'''. There is a sample file in
that directory which has lots of useful information on configuring FreeVRRP. For our purposes we will
configure FreeVRRP as follows:-
''Server #1''
[VRID]
serverid=1
interface=fxp0
priority=255
addr=192.168.0.3/24
useVMAC=yes
spanningtreelatency=40
''Server #2''
[VRID]
serverid=1
interface=fxp0
priority=250
addr=192.168.0.3/24
useVMAC=yes
spanningtreelatency=40
Note that the only difference here is one of priorities between servers. Server #1 has higher priority than
Server #2 and will thus be selected to hold the virtual IP.
The IP address that FreeVRRP creates will eventually be populated by pen, and from there it will load-balance.
Connecting both of your servers via a switch which is running spanning-tree protocal or portfast, requires
the "spanningtreelatecy=40" line in the configurations on both servers. Without that line, the switch will
get confused and you'll lose both your servers from the network. 40 seconds will cover completion of
spanning-tree, and is a safe value to use here.
'''Pen'''
The configuration is identical between servers. In FreeBSD, pen can be enabled at boot time by issuing the
following ''/etc/rc.conf'' edits on both servers:-
''/etc/rc.conf''
pen_enable="YES"
pen_flags="-d -C localhost:8888 -u pen -j /home/pen -l /pen.log -p /pen.pid -h -T 43200 -t 8 -x 2000 -H -w /home/pen/stats.html
-c 20000 80 192.168.0.1:8180 192.168.0.2:8180"
Here's what the pen flags accomplish:-
'''-d'''
This enables debugging. The debug messages can be found in ''/var/log/debug.log''. You can also add more -d options to the config
line to show more detailed debugging.
'''-C localhost'''
This sets the control port for pen. This port communicates information about the status of pen.
'''-u pen'''
This is the user pen will run under. It's best to not let it run as root.
'''-j /home/pen'''
This means to make pen run inside a chroot jail. It's easiest to let it run inside the home directory.
'''-l /pen.log'''
This sets the location of the pen log file. You can either use a filename, or a IP:port for penlogd
(it is in the chroot jail in this example, hence the shortened path)
In our system, it is set to 192.168.0.10:10000 where we have penlogd waiting for the logs.
'''-p /pen.pid'''
This sets the location of a pid file. Absolutely necessary if you manage the pen log files with newsyslog.
(it is in the chroot jail in this config, hence the shortened path)
'''-h'''
This enables hashing, making incoming connections "sticky". This means that once a client connects
for the first time,
they will stick to a particular server for all connections. This avoids spreading data between
servers for one connection,
resulting in one server not knowing what the other is doing and dropped connections.
A client will always go to it's server (as designated by the hash) unless the server is down, at which
point the connection will be redirected.
'''-T 413200'''
This sets the hashing timeout (in seconds) for each client. From what I understand, if a client makes no
new connections for the duration of the limit, pen will forget it's sticky server designation for that client and send
the client to a new server for connections. Currently the time out is set to 12 hours per client.
'''-t 8'''
pen connection time out. By default this value is 5 seconds, but I decided to be a bit generous.
'''-x 2000'''
Maximum number of simultaneous connections. By default, pen compiles to use 1024 file descriptors, giving this value
a maximum of 507 max connections. Pen needed a recompile using WITH_FDSETSIZE=16384 to reach over the default max connection limit.
Fortunately our hardware is great with 4GB RAM so it can handle more than this. With the new value, maximum setting is
now 8191 simultaneous connections, but it is at 2000 for the moment.
'''-c 20000'''
Maximum number of clients. I assume that this value limits how many client IPs to track (via hashing).
At time of writing, we have about 15,000 users, so this value is really overkill unless they all log on at
the same time. This value is sure to eat a chunk of memory.
'''-H'''
This adds an X-Forwarded-For header into HTTP. As long as we can get our webserver to see this header and rewrite its
logs accordingly, it saves us from having to use penlogd. (We need to use it anyway to produce HTTPS client IP logs)
'''-W'''
This enables servers to have weights. That is, to give one server more preference for connections than another. If SSL is ever going to be encapsulated on pen, this option may become necessary.
'''-w /home/pen/stats.html'''
This creates a HTML up-to-the-minute status report. The status report generation needs to be triggered in a cron job.
(eg. 2,12,22,32,42,52 * * * * kill -USR1 `cat /home/pen/pen.pid`)
Not being used on our system, probably because we have too many maximum clients for this feature to work (and we are using
Webalizer anyway)
'''80'''
This is the port that pen will listen on. By default pen takes all free IP addresses to listen on unless you specify a host here.
The server's IP address and freeVRRP virtual IPs are fair game for pen to glomm itself to.
'''192.168.0.1:8180'''
The first server we will loadbalance. Apache is waiting here for a conneciton.
'''192.168.0.2:8180'''
The second server we will loadbalance. Apache waits here too.
Pen should now accept incoming connections to port 80, and then balance them between our listening apache ports across our servers.
'''Security issues'''
Since Pen is now waiting on port 80 in front of apache, this means that remote access attacks will have to
face-off against pen and not apache. This is a nice security "buffer" for apache (assuimg that Pen has no buffer
overflow exploits!). However, pen can be run in a chroot environment as an underprivileged user to thwart
attacks. This is not too hard to do, so there's no real excuse for not doing it. First a user for pen and a home
directory for chrooting must be created:-
# pw useradd pen -m -s /sbin/nologin
and then pen flags must be configured to use that user and define the chroot location.
(e.g. by adding ''' -u pen -j /home/pen ''') respectively into pen_flags.
You do not need to copy any libraries or any files into the chroot directory.
'''Simple Logging'''
Logs can be sent to anywhere on the system (such as /var/log/pen) with no problems usually.
If you are logging under chroot, it is easiest to log into the home user's directory, and doing so saves
messing around with file and directory permissions. In this case '''-l /pen.log''' is all you need
to add to pen_flags. (Note that you do not need to enter a full path for the log file, you are in a chroot environment!).
Also note that pen logs can grow substantially with constant connections, so ensure that your logs directory has plenty of space.
If you are logging, set up newsyslog to rotate the pen log file, as pen does not manage it automatically. If you
don't manage it, it will grow bigger than Everest. The following newsyslog entry should manage the logs, assuming
that you are running pen in a chroot jail. Note that pen needs ownership of the log file or it will die when rotated.
/home/pen/pen.log pen:pen 664 3 100 * J /home/pen/pen.pid
'''Pen, SSL, and running HTTP and HTTPS from the same web server'''
There are two ways to do this, and the one you choose will probably depend on how badly you need to have
decent weblogs for analysis. First, I'll deal with the easy way, and in the next section I deal with web logging issues.
The easy way involves running both HTTP and HTTPS ports on our webserver.
You need to run two loadbalancing
instances when you want to loadbalance both HTTP and HTTPS at the same time. This is simple enough to do via
the command line. However, to do this cleanly on FreeBSD within the rc.conf system, the following needs to be done:-
* The start script at ''/usr/local/etc/rc.d/pen'' needs to be copied into the same directory and be renamed (e.g. pen2)
* the contents of the new pen2 start script all need to have references changed from pen to pen2
* the pen binary at ''/usr/local/bin/pen'' will need to be copied into the same directory and be renamed pen2
* extra lines need to be added into rc.conf (''pen2_enable'' and ''pen2_flags'')
* the second pen instance should listen on a different control port (e.g. 8889), as well as write to a different log and pid file
* and of course it should be set to listen on a different port and contact servers on different ports to the first one.
The HTTPS Pen instance then listens to the outside on port 443, and it forwards its requests to SSL capable
webserver ports across our system. There is no difference to loadbalancing HTTP. The only requirement is that
the SSL cert our webservers use be registered with the hostname owned by our VRRP virtual IP address. Since
connections to the client will appear to originate from the virtual IP address, SSL authentication and encryption
goes off without a hitch, despite that the SSL certificate actually resides on more than one web server behind
the load balancing.
This solution works perfectly, but it won't produce good merged HTTPS weblogs for analysis...
'''Web logging issues for HTTP and HTTPS'''
Logging issues arise when our webserver wants to see which IPs have been using it's service. The trouble is,
since pen is doing loadbalancing, all packets arrive at our webserver appearing to all have originated from pen.
The end result is that the malformed logs we get are not very useful for pumping through a web log analyzer.
There are two ways to solve this problem. One is by using ''penlogd''. Penlogd acts as a daemon and can collect
logs set to it by pen instances, and merge them with logs sent to it by our webserver. The end result is that
pen generates a log file with the original client IPs shown in the web common log format. This solution works
great for HTTP.
Also, there is an X-Forwarded-For header writing function in pen. With this option set, pen adds this header
(containing the client's IP) inside the first packet, whereby our webserver can find it and rewrite it's logs
with the header to show the client's IP. This method, however, makes penlogd's log merging functions somewhat
obsolete. However it doesn't matter. Penlogd can continue to collect logs thrown at it without having to merge anything.
From our point of view, this is useful because we can use penlogd to collect all our web logs and aggregate them
all at one place.
Unfortuantely logging HTTPS in the same way is a huge problem. With SSL connections, data is encrypted
between the client and our webserver. Since pen just sits in the middle pushing packets, it can't understand the
encrypted packets passing through. This results in pen being unable to send meaningful information to penlogd
to match and merge the info with the decrypted webserver logs. The final logs produced are unable to be merged
at all and we will still see the loadbalancer IP address (or nothing at all) in the logs and not the client IP
address. To make things worse, HTTPS does not contain an X-Forwarded-For header, so we cannot even get our webserver
to rewrite its logs this way either.
The only solution to the SSL logging issue was to enable the SSL experimental modification on pen. It works well
enough to manage the server's SSL certificate and private key, and encrypt/decrypt client connections
to and from a standard HTTP web servers with not much content. Unfortunately, however, the feature is '''EXPERIMENTAL'''.
It is still very broken and buggy and not ready for production use, especially when dealing with large amounts of
scripting and images. On several occasions I've caused it to crash pen. Until it is fixed (if ever) we cannot use
it on our production systems.
Assuming we play with the experimental SSL encapsulation anyway, this would mean that pen lifts the burden of SSL
encryption from our webservers, and as such, our webservers will not be balancing the SSL encryption/decryption
load anymore. Our running instance of pen would be taking on this responsibililty all by itself on
one server. However, making use of pen's server weight functionality, this slight imbalance of server resources
can be mitigated somewhat.
There is also another problem associated with using SSL encapsulation, and that is that pen's ability to run in
chroot and still do SSL is made a little bit complicated. SSL demands access to a pseudo random number generator (PRNG),
and in the case of FreeBSD, this is ''/dev/urandom'' which is not in our chroot jail at all. The solution then,
is to mount a devfs in the jail. Assuming our jail is /home/pen, we first must make a directory called /home/pen/dev.
By adding the following line in ''/etc/fstab'' we can then get FreeBSD to mount devfs upon reboot.
devfs /home/pen/dev devfs ro 0 0
With SSL encapsulation available to us, we now must add the X-Forwarded-For header into decrypted SSL packets in pen for these logs to be merged by penlogd.
'''Configuring penlogd'''
We have set up the file server at 192.168.0.10 to run ''penlogd''.
Starting up penlogd in FreeBSD
wasn't too hard, but I had to duplicate and modify the pen start script in ''/usr/local/etc/rc.d'' to allow me
to start penlogd at boot time. I renamed all the references to "pen" to "penlogd" in that file. Then it
was simply a matter of dropping the following lines into ''/etc/rc.conf''. The configuration options are similar
to pen's, except for the daemon listening port at the end of the penlogd_flags configuration.
penlogd_enable="YES"
penlogd_flags="-d -u pen -j /home/pen -l /system_access -p /penlogd.pid 10000"
The resulting logfile (''/home/pen/system_access'') is rotated in newsyslog.conf
We have to ensure that newsyslog keeps pen as the owner of the file, on penlogd will die upon rotation.
/home/pen/system_access pen:pen 664 6 * $M1D0 J /home/pen/penlogd.pid
Pen can be set up to send it's logs to penlogd with a command line option such as '''-l 192.168.0.10:10000'''.
This is necessary if we have HTTP connections and we're not using the X-Forwarded-For header. It is
also absolutely necessary if we want to merge SSL encapsulated pen logs and their corresponding web logs.
If we're only loadbalancing HTTP and we make use of the X-Forwarded-For header, we don't need really penlogd to
merge logs, and we have the option of getting pen to log itself elsewhere... or not at all.
One major issue involves the actual sending of web logs to penlogd. The web server must pipe the logs through
the ''penlog'' program, which then sends them on to penlogd. To set up apache to do this, a line needs to be added
to the configuration :-
CustomLog "|/usr/local/bin/penlog 192.168.0.10 10000" common
However, since our system uses tomcat (which has no reliable piped logs functionality like apache), this becomes tricky.
To get around this issue, I have created a fifo to which tomcat can send all it's logs.
# mkfifo /usr/local/tomcat5.5/logs/fifo
Also, I have configured tomcat to send all its logs into the fifo, and ensure that they are not automatically rotated.
Also note here that the pattern for the logs has been adjusted so that Tomcat prints out the X-Forwarded-For header that pen gives it.
It is also possible to log tomcat to a seperate file in addition to sending logs through our fifo. It is done simply
by making another AccessLogValve class with different settings.
Now for the fun part. I have a perl script running in the background and receiving anything that goes into
that fifo, and then piping it out to penlog. This does the job well.
Firstly, the start script ( which is in ''/usr/local/etc/rc.d'' ). FreeBSD will start it up automatically at boot time :-
#!/bin/sh
# Repilog start script
pid=`ps -ax | grep repilog | grep -v grep | grep -v repilog.sh | awk '{print $1}'`
case "$1" in
start)
if [ $pid ]
then
else
[ -x /usr/local/sbin/repilog ] && /usr/local/sbin/repilog &
echo "REPILOG started!"
fi
;;
stop)
if [ $pid ]
then
kill $pid && echo "REPILOG killed!"
fi
;;
*)
echo ""
echo "Usage: `basename $0` { start | stop }"
echo ""
exit 64
;;
esac
The actual script looks like this. Note that there are two configuration lines which need editing if you want to
use it to fifopipe anything else. Currently, the script sits in ''/usr/local/sbin'' on our system:-
#!/usr/bin/perl
# REPILOG - REliable PIped LOGs Simulator
# By Nathan Butcher 2006/11/21
#
# A shim to allow a logging program to spit out logs to a fifo
# where they can be piped to another waiting program
# Please run me in the background, otherwise I will take over your console!
# or better yet, use the start script!
use strict;
### configurables
my $fifo="/usr/local/tomcat5.5/logs/fifo"; # Location of fifo
my $command='|/usr/local/bin/penlog 192.168.0.10 10000'; # Command
###
### check to see if fifo exists or not
unless ( -p $fifo) {
die "Repilog has no fifo to use. Please make it.\n";
}
### Subroutine to handle interrupts and close cleanly
sub handler {
close (LOG);
close (FIFO);
exit 0;
}
### Interrupt signal handling
$SIG{INT} = 'handler';
$SIG{KILL} = 'handler';
$SIG{HUP} = 'handler';
### Main (endless) loop
while (1) {
open(FIFO, "$fifo") || die "Repilog can't access the fifo!\n";
while () {
open(LOG, "$command") || die "Repilog cannot access command!\n";
print LOG "$_";
close (LOG);
}
close(FIFO);
warn "\nRepilog pipe closed. Restarting\n";
}
This solution is a bit kludgy, but does the job.
The grand unified webserver log file is currently to be found on the system fileserver at ''/home/pen/system_access''. We intend to run webalizer on it for analysis. BEFORE doing this, however, it is best to run the program '''mergelog''' on the merged file. This is to ensure that the web records come out in the correct chronological order. If not, we can expect our weblog analysis to be somewhat borked. Mergelog can be install from ports in FreeBSD from /usr/ports/www/mergelog.
At this point, everything is configured. A reboot on both servers to test the servers' startup can be done.
Failover can then be tested by playing hokey-pokey with the ethernet cables in your servers. You pull one cable
in, then pull the same cable out, then pull other other cable out, then back in, and test them all about. You get
the picture. If you are testing apache, you can modify the index.html to display the host name of the server, and
then with a web browser you can keep reloading the virtual IP address and watch the servers round-robin (if you
have set it), and fail over when you pull cables out.
==Appendix A : Simple FreeVRRP configuration==
Here is the configuration for freeVRRP to successfully failover across two
hosts (which may or may not have
services running on them). It is the simplest and most basic configuration.
For this example, we assume 192.168.0.0/24 and the interface is an fxp0.
Also note how that VRIDs supporting
the server's IP must have priority=255 for MASTER. That's really the only difference between the two server configurations.
Also, gratuitous arps must be configured to be sent, otherwise the servers will go nuts accusing each other of
jumping on each other's IPs. (Remembering here that we are creating virtual IPs on top of existing IPs)
'''/usr/local/etc/freevrrpd.conf'''
'''server1 (192.168.0.1)'''
[VRID]
serverid=1
interface=fxp0
priority=255
addr=192.168.0.1/32
useVMAC=yes
setgratuitousarp=yes
[VRID]
serverid=2
interface=fxp0
priority=250
addr=192.168.0.2/32
useVMAC=yes
setgratuitousarp=yes
'''server2 (192.168.0.2)'''
[VRID]
serverid=1
interface=fxp0
priority=250
addr=192.168.0.1/32
useVMAC=yes
setgratuitousarp=yes
[VRID]
serverid=2
interface=fxp0
priority=255
addr=192.168.0.2/32
useVMAC=yes
setgratuitousarp=yes
With VRRP set up, what we need to do now is make a configuration on apache in order for it to accept connections
for both server's IP adresses in case one of them goes down.
'''/usr/local/etc/apache/httpd.conf'''
Listen 192.168.0.1:80
Listen 192.168.0.2:80
When one of the servers go down, FreeVRRP on the remaining server will notice this and will then assume the
dead server's IP address. To the outside world, both IP addresses will still be "pingable", but all the traffic
will now be routed to the remaining server until the dead server comes back up. The apache configuration ensures
that connections to either IP address will be accepted by apache. The end result is that even with a dead server,
the apache service is still running across both IP addresses they are expected to be at.
==Appendix B : DNS round-robin overkill!==
Thinking of doing DNS round-robin over both these servers now? That's overkill, but not impossible either.
There is one drawback however, and that is that you'll lose your client's "stickyness" with their favourite server.
If client stickyness is important to you, stick to the single virtual IP address to take all connections and do
the loadbalancing. Otherwise, read on!
Firstly, the '''-h''' in the pen flags rc.conf line can be removed and replaced with '''-r'''. This forces round-robin
load balancing without regard to what client initiates conatct.
Then you can do the following configuration to the freevrrpd.conf on both servers (swapping priorities
for virtual IPs between servers):-
''Server #1''
[VRID]
serverid=1
interface=fxp0
priority=255
addr=192.168.0.5/24
useVMAC=yes
[VRID]
serverid=2
interface=fxp0
priority=250
addr=192.168.0.6/24
useVMAC=yes
''Server #2''
[VRID]
serverid=1
interface=fxp0
priority=250
addr=192.168.0.5/24
useVMAC=yes
[VRID]
serverid=2
interface=fxp0
priority=255
addr=192.168.0.6/24
useVMAC=yes
Set up multiple A records in DNS and thing should work!
That's more than enough load-balancing and failover than I care for however.