Automatic backup uplink fail-over
This page contains Debian specific information, it may work on its derivatives like Ubuntu but you will most likely need to adapt it for your distribution/system.
In this connected world getting off-line can be very, very bad. Specially on branch offices.
We have alternative connections on branch offices, but those shiny Cisco routers that do automatic switching are pretty expensive... Also ISPs offer you Back-up links, but that means that your back-up link goes through the same network as the primary connection which defeats the purpose of a back-up link.
So, as all our offices have a Linux based router/server between them and the network I wrote a dirty hackish script that would take care of changing ISPs when the primary connection failed and would then update the DNS servers accordingly.
- Note:
- This is not about balancing over two connections (properly doing so needs patching the kernel and a different setup).
- This is not about receiving connections from both uplinks (you could, but DNAT requires some extra stuff like conn-tracking).
- This is about switching to backup uplink when the primary one fails.
First, we will create a configuration file in /etc/default/update-ip:
HOST=office1 # Whe use this to buildup our DNS name
KEY=waaaaattxxxxxxxayyyeee== # MD5 Key for updating DNS, you may skip this.
IF1=eth0wan # Interface of the primary uplink
IP1=1.2.3.4 # Local IP for the primary uplink
GW1=1.2.3.1 # IP of the primary link gateway
IF2=eth0wan2 # Interface of the secondary uplink
IP2=2.2.3.4 # Local IP of the secondary uplink
GW2=2.2.3.1 # IP of the secondary link gateway
This file is sourced by our checking script for configuration, now for the checking script (i.e: /usr/local/sbin/checkgw.sh, don't forget to chmod gou+rx it):
#!/bin/sh
# vim: foldmethod=marker
LOGTAG="checkgw"
ME=`hostname -f`
MYPATH=/usr/local/sbin/checkgw.sh
# This is just to get our IP for DNS updates, note this is a stupid cgi I use, you
# should really not relly on it for your DNS updates ;)
IPURL=http://www.marcfargas.com/~/cgi-bin/myip.cgi
logger -t "$LOGTAG" "CheckGW Starting"
# Read configuration
. /etc/default/update-ip
gwalive() {
# {{{ Check that INET can be reached via interface in $1
HOSTS="94.23.24.6 72.14.221.104 74.125.77.104 216.239.59.104"
# Max hosts that can fail.
FAILS=2
IFACE=$1
FAILED=0
for H in $HOSTS; do
echo "Checking... $H"
ping -w 1 -I $IFACE -c 1 $H > /dev/null 2>&1
RES=$?
if [ ! $RES -eq 0 ]; then
FAILED=`expr $FAILED + 1`
fi
done
if [ $FAILED -gt $FAILS ]; then
logger -t "$LOGTAG" "Interface $IFACE cannot reach the internet."
return 1
fi
logger -t "$LOGTAG" "Interface $IFACE DOES reach the internet."
return 0
} #}}}
update_ip() {
# {{{ Update the DNS servers with out current IP.
MYIP=`wget -q -nd -O - $IPURL | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"`
if [ "$MYIP" = "" ]
then
logger -t "$LOGTAG" "Could not get an IP Address"
return 1
fi
if [ -f "/tmp/lastip" ]
then
if [ "`cat /tmp/lastip`" = "$MYIP" ]
then
return 0
fi
fi
cat << EOF > /tmp/update.txt
server YOUR-DNS-SERVER-GOES-HERE
update delete $HOST.YOUR-DNS-ZONE.com. A
update add $HOST.YOUR-DNS-ZONE.com. 180 A $MYIP
show
send
EOF
cat /tmp/update.txt | nsupdate -y $ME.:$KEY
echo $MYIP > /tmp/lastip
logger -t "$LOGTAG" "Updated ip for $HOST being now $MYIP"
} # }}}
activate() {
logger -t "$LOGTAG" "Enabling interface $1 via $2"
/sbin/ip route change default via $2
/sbin/ip route add default via $2 > /dev/null 2>&1
/sbin/ip route flush cache
touch /tmp/checkgw.$1
## /usr/sbin/invoke-rc.d openvpn restart << Make sure you restart stuff like that.
## update_ip ## Uncomment this to enable DNS updates (make sure you set it up).
}
check() {
# {{{ Check our connection.
gwalive $IP1
RES=$?
if [ $RES -eq 0 ]; then
echo "OK" > /tmp/checkgw-status.$IF1
if [ ! -f /tmp/checkgw.$IF1 ]; then
rm -f /tmp/checkgw.$IF2 2>/dev/null
activate $IF1 $GW1
fi
else
echo "BAD" > /tmp/checkgw-status.$IF1
if [ ! -f /tmp/checkgw.$IF2 ]; then
rm -f /tmp/checkgw.$IF1 2>/dev/null
activate $IF2 $GW2
fi
fi
} # }}}
check2() {
# {{{ Check alternate connection.
gwalive $IP2
RES=$?
if [ $RES -eq 0 ]; then
echo "OK" > /tmp/checkgw-status.$IF2
else
echo "BAD" > /tmp/checkgw-status.$IF2
fi
} # }}}
case $1 in
update_ip)
update_ip
exit 0;;
check)
check
check2
exit 0;;
esac
echo "UNKNOWN ACTION $1"
exit 1
There's one last step, as we need to play around with routing we must make sure that our test pings get out from the interface we intend them to leave the machine. For that routing tables are a key element. We will create two routing tables, one for every interface in /etc/iproute2/rt_tables:
255 local
254 main
253 default
100 isp1 # Add for ISP 1
101 isp2 # Add for ISP 2
0 unspec
And now we add some routing rules, we'll put them /etc/network/interfaces, make sure to put the right IP addresses and interface names there, you can't do variable substitution:
iface eth0wan
[...]
up ip route add default via $GW1 dev $IF1 table isp1
up ip rule add from $IP1 table isp1
down ip route del default via $GW1 dev $IF1 table isp1
down ip rule del from $IP1 table isp1
iface ethwan2
up ip route add default via $GW2 dev $IF2 table isp2
up ip rule add from $IP2 table isp2
down ip route del default via $GW2 dev $IF2 table isp2
down ip rule del from $IP2 table isp2
What those commands do is:
- Add a default route on every table via their respective gateways
- Add a rule telling that traffic going out from each interface should look on the corresponding table, effectively ignoring the default route in the default table.
You should check this works by running the script with a "check" argument. NOTE: MAKE SURE you do this either from the same network as the computer lives in or that you have some way to get back to the machine when it kicks you out ;)
You may switch off the primary uplink and run again to see if it changes, and then it changes back.
If it works as expected you may now put a crontab entry to check the connection every minute (contrab -e):
* * * * * /usr/local/sbin/checkgw.sh check > /dev/null
And you're done. Hope this proves useful to somebody! Comments welcome ;)