AnyIP: Bind a whole subnet to your Linux machine

IPv6 Prefix Delegation

In my previous post I wrote how you can use Docker with IPv6 and Prefix Delegation.

A IPv6 subnet routed to a Linux machine can be used with other things than Docker. That’s where the AnyIP feature of the kernel comes in.

Linux Kernel AnyIP

The AnyIP feature of the Linux kernel allows you to bind a complete IPv4 or IPv6 subnet to your system.

Instead of adding all addresses manually to the kernel you can tell it to bind a complete subnet.

Configuring

IPv4

ip -4 route add local 192.168.0.0/24 dev lo

In this case the Linux kernel will now respond to ARP requests for any IPv4 address in the 192.168.0.0/24 subnet.

IPv6

ip -6 route add local 2001:db8:100::/64 dev lo

In this case the kernel will respond for Neigh Sollicitations on any IPv6 address in the 2001:db8:100::/64 subnet.

Example usage

Let’s assume that you have the IPv6 prefix 2001:db8:100::/60 routed to your Linux machine through IPv6 prefix delegation.

From that /60 subnet we take the first /64 subnet and attach it to lo.

ip -6 route add local 2001:db8:100::/64 dev lo

You can now ping any of the addresses in that subnet:

2001:db8:100::1
2001:db8:100::100
2001:db8:100::200
2001:db8:100::dead:b33f

If you would start a webserver which listens on port 80 you can use any of the IPv6 addresses in that subnet and the webserver will respond to it.

Use cases

It could be that you want to to mass-shared hosting on a system where you want to assign each hostname/domainname it’s own IPv6 address. Instead of attaching single IPs to a interface you can simply attach a complete subnet and point traffic to any of the IPs in that subnet.

Demo

On a virtual machine on PCextreme’s Aurora Compute I deployed a Instance with Prefix Delegation enabled.

After running ‘dhclient’ I got the subnet 2a00:f10:500:40::/60 assigned to my Instance.

It was then just one line to attach a /64 subnet:

ip -6 route add local 2a00:f10:500:40::/64 dev lo

Random address generator

I wrote a small piece of Python code to generate a random IPv6 address:

#!/usr/bin/env python3
"""
Generate a random IPv6 address for a specified subnet
"""

from random import seed, getrandbits
from ipaddress import IPv6Network, IPv6Address

subnet = '2a00:f10:500:40::/64'

seed()
network = IPv6Network(subnet)
address = IPv6Address(network.network_address + getrandbits(network.max_prefixlen - network.prefixlen))

print(address)

Using a small loop in Bash I could now ping random addresses in that subnet:

while [ true ]; do ping6 -c 2 `./random-ipv6.py`; done

Some example output:

--- 2a00:f10:500:40:d142:1092:ea84:74b4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 10.252/11.680/13.108/1.428 ms
PING 2a00:f10:500:40:4e50:f264:6ea9:d184(2a00:f10:500:40:4e50:f264:6ea9:d184) 56 data bytes
64 bytes from 2a00:f10:500:40:4e50:f264:6ea9:d184: icmp_seq=1 ttl=56 time=10.0 ms
64 bytes from 2a00:f10:500:40:4e50:f264:6ea9:d184: icmp_seq=2 ttl=56 time=10.0 ms

--- 2a00:f10:500:40:4e50:f264:6ea9:d184 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 10.085/10.087/10.089/0.002 ms
PING 2a00:f10:500:40:d831:1f89:b06d:fe12(2a00:f10:500:40:d831:1f89:b06d:fe12) 56 data bytes
64 bytes from 2a00:f10:500:40:d831:1f89:b06d:fe12: icmp_seq=1 ttl=56 time=9.77 ms
64 bytes from 2a00:f10:500:40:d831:1f89:b06d:fe12: icmp_seq=2 ttl=56 time=10.1 ms

--- 2a00:f10:500:40:d831:1f89:b06d:fe12 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 9.777/9.958/10.140/0.207 ms
PING 2a00:f10:500:40:2c45:26ee:5b93:fa2(2a00:f10:500:40:2c45:26ee:5b93:fa2) 56 data bytes
64 bytes from 2a00:f10:500:40:2c45:26ee:5b93:fa2: icmp_seq=1 ttl=56 time=10.2 ms
64 bytes from 2a00:f10:500:40:2c45:26ee:5b93:fa2: icmp_seq=2 ttl=56 time=10.0 ms

Maximum amount of Docker containers on a single host

While playing with Docker I wanted to know how many containers I could spawn on a single system.

A quick for-loop told me that the maximum is 1023 containers on a single host:

Error response from daemon: Cannot start container 09c8f46b59ccc311e8d0352789db6debd0fa1df98186c5cda98583d762d48601: adding interface vetha5d205e to bridge docker0 failed: exchange full

The limitation here is the Linux bridging which can’t have more then 1023 interfaces attached. Specifically net/bridge/br_private.h BR_PORT_BITS cannot be extended because of spanning tree requirements.

wido@wido-desktop:~$ docker ps|wc -l
1024
wido@wido-desktop:~$

Although that says 1024 there is a header line, so we have to subtract one. That brings it to 1023.

wido@wido-desktop:~$ docker version
Client:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64
wido@wido-desktop:~$

Failover with Nexenta, NFS and the RSF-1 plugin

The title might seem a bit cryptic, but this post is about a High Available Nexenta cluster with the RSF-1 we are deploying.

While we are waiting for the moment where we can start using Ceph we are implementing new storage for our hosting clusters. Our current Linux machines with LVM and XFS are not up to the task anymore.

After some testing and discussing we chose to use Nexenta. What Nexenta is and how awesome ZFS is can be found on other places on the net, I’m not going to discuss that here.

I wanted to publish our findings about the HA plugin and NFS.

In short, we have two headends connected with two SAS JBOD’s. The RSF-1 plugin makes sure the ZPOOL is imported on one headend at the time. If one headend fails, the plugin automatically fails the pool over to the other headend.

The plugin provides one HA IP which is shared between the headends, you probably get the point.

We’ve been doing some testing and noticed that when we mount NFS (v3) over TCP the failover takes a staggering 6 minutes! Well, the failover doesn’t take 6 minutes, but that’s the time it takes for the TCP connections to recover.

When mounting over UDP the service is continued in 50 seconds, so that’s a big difference!

Some testing showed that this is due to the following kernel settings:

net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15

This page explains what those two values actually control.

We’ve been experimenting with those values and lowering retries1 to 1 gave us the same recovery times as with UDP, but sometimes the recovery would still take 6 minutes..

For now I advise to use NFS with UDP (which gives better performance anyway), but if you need to use TCP for some reason try fiddling with these values.