Automatic backup uplink fail-over

This page contains Debian specific information, it may work on its derivatives like Ubuntu but you will most likely need to adapt it for your distribution/system.

In this connected world getting off-line can be very, very bad. Specially on branch offices.

We have alternative connections on branch offices, but those shiny Cisco routers that do automatic switching are pretty expensive... Also ISPs offer you Back-up links, but that means that your back-up link goes through the same network as the primary connection which defeats the purpose of a back-up link.

So, as all our offices have a Linux based router/server between them and the network I wrote a dirty hackish script that would take care of changing ISPs when the primary connection failed and would then update the DNS servers accordingly.

Note:
  • This is not about balancing over two connections (properly doing so needs patching the kernel and a different setup).
  • This is not about receiving connections from both uplinks (you could, but DNAT requires some extra stuff like conn-tracking).
  • This is about switching to backup uplink when the primary one fails.

First, we will create a configuration file in /etc/default/update-ip:

HOST=office1 # Whe use this to buildup our DNS name
KEY=waaaaattxxxxxxxayyyeee== # MD5 Key for updating DNS, you may skip this.
IF1=eth0wan  # Interface of the primary uplink
IP1=1.2.3.4  # Local IP for the primary uplink
GW1=1.2.3.1  # IP of the primary link gateway
IF2=eth0wan2 # Interface of the secondary uplink
IP2=2.2.3.4  # Local IP of the secondary uplink
GW2=2.2.3.1  # IP of the secondary link gateway

This file is sourced by our checking script for configuration, now for the checking script (i.e: /usr/local/sbin/checkgw.sh, don't forget to chmod gou+rx it):

#!/bin/sh
# vim: foldmethod=marker
LOGTAG="checkgw"
ME=`hostname -f`
MYPATH=/usr/local/sbin/checkgw.sh
# This is just to get our IP for DNS updates, note this is a stupid cgi I use, you
# should really not relly on it for your DNS updates ;)
IPURL=http://www.marcfargas.com/~/cgi-bin/myip.cgi

logger -t "$LOGTAG" "CheckGW Starting"
# Read configuration
. /etc/default/update-ip

gwalive() {
    # {{{ Check that INET can be reached via interface in $1
    HOSTS="94.23.24.6 72.14.221.104 74.125.77.104 216.239.59.104"
    # Max hosts that can fail.
    FAILS=2
    IFACE=$1
    FAILED=0

    for H in $HOSTS; do
      echo "Checking... $H"
      ping -w 1 -I $IFACE -c 1 $H > /dev/null 2>&1
      RES=$?
      if [ ! $RES -eq 0 ]; then
        FAILED=`expr $FAILED + 1`
      fi
    done
    if [ $FAILED -gt $FAILS ]; then
      logger -t "$LOGTAG" "Interface $IFACE cannot reach the internet."
      return 1
    fi
    logger -t "$LOGTAG" "Interface $IFACE DOES reach the internet."
    return 0
} #}}}

update_ip() {
    # {{{ Update the DNS servers with out current IP.
    MYIP=`wget -q -nd -O - $IPURL | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"`
    if [ "$MYIP" = "" ]
    then
        logger -t "$LOGTAG" "Could not get an IP Address"
        return 1
    fi
    if [ -f "/tmp/lastip" ]
    then
      if [ "`cat /tmp/lastip`" = "$MYIP" ]
      then
        return 0
      fi
    fi
    cat << EOF > /tmp/update.txt
server YOUR-DNS-SERVER-GOES-HERE
update delete $HOST.YOUR-DNS-ZONE.com. A
update add $HOST.YOUR-DNS-ZONE.com. 180 A $MYIP
show
send
EOF
    cat /tmp/update.txt | nsupdate -y $ME.:$KEY
    echo $MYIP > /tmp/lastip
    logger -t "$LOGTAG" "Updated ip for $HOST being now $MYIP"
} # }}}

activate() {
    logger -t "$LOGTAG" "Enabling interface $1 via $2"
    /sbin/ip route change default via $2
    /sbin/ip route add default via $2 > /dev/null 2>&1
    /sbin/ip route flush cache
    touch /tmp/checkgw.$1
    ## /usr/sbin/invoke-rc.d openvpn restart << Make sure you restart stuff like that.
    ## update_ip  ## Uncomment this to enable DNS updates (make sure you set it up).
}

check() {
    # {{{ Check our connection.
    gwalive $IP1
    RES=$?
    if [ $RES -eq 0 ]; then
      echo "OK" > /tmp/checkgw-status.$IF1
      if [ ! -f /tmp/checkgw.$IF1 ]; then
        rm -f /tmp/checkgw.$IF2 2>/dev/null
          activate $IF1 $GW1
      fi
    else
      echo "BAD" > /tmp/checkgw-status.$IF1
      if [ ! -f /tmp/checkgw.$IF2 ]; then
        rm -f /tmp/checkgw.$IF1 2>/dev/null
            activate $IF2 $GW2
      fi
    fi
} # }}}

check2() {
    # {{{ Check alternate connection.
    gwalive $IP2
    RES=$?
    if [ $RES -eq 0 ]; then
      echo "OK" > /tmp/checkgw-status.$IF2
    else
      echo "BAD" > /tmp/checkgw-status.$IF2
    fi
} # }}}

case $1 in
    update_ip)
        update_ip
        exit 0;;
    check)
        check
        check2
        exit 0;;
esac
echo "UNKNOWN ACTION $1"
exit 1

There's one last step, as we need to play around with routing we must make sure that our test pings get out from the interface we intend them to leave the machine. For that routing tables are a key element. We will create two routing tables, one for every interface in /etc/iproute2/rt_tables:

255  local
254  main
253  default
100  isp1     # Add for ISP 1
101  isp2     # Add for ISP 2
0    unspec

And now we add some routing rules, we'll put them /etc/network/interfaces, make sure to put the right IP addresses and interface names there, you can't do variable substitution:

iface eth0wan
  [...]
  up ip route add default via $GW1 dev $IF1 table isp1
  up ip rule add from $IP1 table isp1
  down ip route del default via $GW1 dev $IF1 table isp1
  down ip rule del from $IP1 table isp1

iface ethwan2
  up ip route add default via $GW2 dev $IF2 table isp2
  up ip rule add from $IP2 table isp2
  down ip route del default via $GW2 dev $IF2 table isp2
  down ip rule del from $IP2 table isp2

What those commands do is:

  • Add a default route on every table via their respective gateways
  • Add a rule telling that traffic going out from each interface should look on the corresponding table, effectively ignoring the default route in the default table.

You should check this works by running the script with a "check" argument. NOTE: MAKE SURE you do this either from the same network as the computer lives in or that you have some way to get back to the machine when it kicks you out ;)

You may switch off the primary uplink and run again to see if it changes, and then it changes back.

If it works as expected you may now put a crontab entry to check the connection every minute (contrab -e):

* * * * * /usr/local/sbin/checkgw.sh check > /dev/null

And you're done. Hope this proves useful to somebody! Comments welcome ;)

Comments

Add your comment