bgp – Widodh

Routing IPv6 through Wireguard with MikroTik and Debian

Recently, I became a RIPE member, which resulted in an IPv6 /29 subnet being allocated to me. One of my main goals was to route a /48 from this /29 to my home, allowing me to use my own IPv6 addresses on my local network.

However, my ISP (Ziggo Zakelijk) does not allow me to announce my own IPv6 address space. Instead, they have statically assigned me a /48 subnet (2001:41f0:6f67::/48) as part of my business account.

Regardless, I wanted the flexibility to switch ISPs in the future without the hassle of renumbering my home network. And, of course, it’s also a fun technical challenge to get this working! Geeky stuff!

Wireguard from AS212540

In a datacenter in Amsterdam, I have a Dell R430 server running Proxmox, where I also run the FRRouting (FRR) daemon to announce AS212540.

ipv6 route 2a14:9b80::/32 Null0
ip router-id 2.57.57.4
!
router bgp 212540

 address-family ipv6 unicast
  redistribute kernel
  redistribute connected
  redistribute static
  neighbor upstream-v6 activate
  neighbor upstream-v6 soft-reconfiguration inbound
  neighbor upstream-v6 route-map upstream-in in
  neighbor upstream-v6 route-map upstream-out out

My idea was to use WireGuard from this server to route a /48 to my Mikrotik CCR1036-8G-2S+ running at home. I would then be able to use parts of my own IPv6 space at home.

Route my own IPv6 /48 to my house
Use Wireguard from my Proxmox server in Amsterdam
Use IPv6 as the underlay under Wireguard
Use as little configuration as possible

Turned out that WireGuard as super easy to get up and running. In the end this was all the configuration I needed in /etc/wireguard/wg0.conf

[Interface]
Address = 2a14:9b80:0:1::1/64
ListenPort = 51820
PrivateKey = THISISMYPRIVATEKEY

[Peer]
PublicKey = PUBLICKEYOFMIKROTIK
AllowedIPs = 2a14:9b80:101::/48
Endpoint = [2001:41f0:6f67::2]:51820

There are plenty of tutorials online on how to set up WireGuard and generate the necessary keys. Instead, I want to show how I built an IPv6-only environment. No NAT, just pure routing!

If you look closely you can spot a few things

2001:41f0:6f67::2 is the WAN IP of my MikroTik router at home
2a14:9b80:101::/48 is the subnet I’m routing towards my house via Wireguard

The BGP announcement on my Proxmox server already routes the /32 to the server. Setting up a WireGuard tunnel was all that was needed to ensure the routes propagated in the local routing table.

root@proxmox:~# ip -6 route show
blackhole 2001:678:3a4:100::/56 dev lo proto static metric 20 pref medium
2a14:9b80::/64 dev vmbr2 proto kernel metric 256 pref medium
2a14:9b80:0:1::/64 dev wg0 proto kernel metric 256 pref medium
2a14:9b80:101::/48 dev wg0 metric 1024 pref medium
blackhole 2a14:9b80::/32 dev lo proto static metric 20 pref medium
default via 2001:678:3a4:1::3 dev vmbr0 proto kernel metric 1024 onlink pref medium
root@proxmox:~#

For the completeness here is the output of the wg command:

interface: wg0
  public key: IhYOkpqE0cIBclaR7zGLml/7BriPIMoMdjmM5dbkkGs=
  private key: (hidden)
  listening port: 51820

peer: ZZkW3L0OES1bqQKdDpe3GQ88G4I3ABZVasuEVyvS5iM=
  endpoint: [2001:41f0:6f67::2]:51820
  allowed ips: 2a14:9b80:101::/48
  latest handshake: 1 second ago
  transfer: 642.48 KiB received, 1.17 MiB sent

Source based routing

Since my home network already has native IPv6 through my ISP, I needed to make sure that only specific traffic was routed outbound via the WireGuard tunnel.

Typically, routing is based on the destination address, but in cases like this, it’s necessary to route based on the source address and sometimes the destination address as well.

In MikroTik, this is known as Policy Routing, which allows for this level of control. It took me a few hours to figure out, and I also discovered that I needed RouterOS 7.18, as earlier versions did not support this setup properly.

Eventually this is the configuration I ended up with, from which I will only show the relevant parts:

/interface wireguard
add listen-port=51820 mtu=1420 name=wg-hrl23
/interface wireguard peers
add allowed-address=::/0 endpoint-address=2001:678:3a4:1::100 endpoint-port=51820 interface=wg-hrl23 name=peer1 persistent-keepalive=30s public-key="IhYOkpqE0cIBclaR7zGLml/7BriPIMoMdjmM5dbkkGs="

/routing table
add disabled=no fib name=wireguard

/ipv6 route
add comment="Ziggo Zakelijk" disabled=no dst-address=::/0 gateway=2001:41f0:6f67::1
add blackhole dst-address=2a14:9b80:101::/48
add disabled=no gateway=wg-hrl23 routing-table=wireguard

/ipv6 address
add address=2001:41f0:6f67::2 comment="Ziggo Zakelijk" interface=ether1
add address=2001:41f0:6f67:1::1 interface=bridgeLocal
add address=2a14:9b80:101::1 interface=wg-hrl23
add address=2a14:9b80:101:1::1 interface=ether4

/ipv6 firewall filter
add action=accept chain=input comment="Wireguard with proxmox01" dst-port=51820 in-interface-list=WAN protocol=udp src-address=2001:678:3a4:1::100/128

/routing rule
add action=lookup dst-address=2001:41f0:6f67::/48 src-address=2a14:9b80:101::/48 table=main
add action=lookup-only-in-table disabled=no src-address=2a14:9b80:101::/48 table=wireguard

The last two lines are the most important!

Traffic between my regular LAN and the subnet routed to my home via Wireguard should just be regular routed traffic
Otherwise the traffic should be diverted to the routing table “wireguard” which routes the traffic outbound via Wireguard

Diagram

To clarify a couple of things I have created a Diagram to hopefully explain how it all comes together.

I hope this explains and inspires you to do the same!

Using L3 (BGP) routing for your Ceph storage

Many Ceph storage environments out there are deployed using a L2 underlay.

This means that the Ceph servers (MON, OSD, etc) are connected using LACP/Bonding to a pair of switches. On their ‘bond0’ device (example) they are assigned an IPv4/IPv6 address and this is used for connectivity between the Ceph nodes and the Ceph clients.

Although this works fine, I try to avoid L2 as much as possible in datacenter deployments. L2 scales up to a certain point, but it has it’s limitations. Modern Top-of-Rack (ToR) switches can easily route traffic and wire-speed. This used to be a limitation of switches in the past. When designing environments I prefer using a L3 approach.

This blogpost is there to show you the rough concept. It’s NOT a copy and paste tutorial. You will need to adapt it to your situation.

Network setup and BGP configuration

Using Juniper QFX5100 switches and Frrouting on the Ceph nodes I’ve established BGP sessions between the ToR and Ceph nodes according to the diagram below.

Each nodes has two independent BGP sessions with the Top-of-Rack in it’s rack. Via these BGP sessions they advertise their local IPv6 /128 address. Via the same sessions they receive a default ::/0 IPv6 route.

ceph01# sh bgp summary 

IPv6 Unicast Summary (VRF default):
BGP router identifier 1.2.3.4, local AS number 65101 vrf-id 0
BGP table version 10875
RIB entries 511, using 96 KiB of memory
Peers 2, using 1448 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V    AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
enp196s0f0np0   4 65002    487385    353917        0    0    0 3d18h17m            1        1 N/A
enp196s0f1np1   4 65002    558998    411452        0    0    0 01:38:55            1        1 N/A

Total number of neighbors 2
ceph01#

Here we see two BGP sessions active over both NICs of the Ceph node. We can also see that a default IPv6 route is received via BGP.

ceph01# sh ipv6 route ::/0
Routing entry for ::/0
  Known via "bgp", distance 20, metric 0
  Last update 01:42:00 ago
    fe80::e29:efff:fed7:4719, via enp196s0f0np0, weight 1
    fe80::7686:e2ff:fe7c:a19e, via enp196s0f1np1, weight 1

ceph01#

The Frrouting configuration ( /etc/frr/frr.conf ) is fairly simple:

frr defaults traditional
hostname ceph01
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface enp196s0f0np0
 no ipv6 nd suppress-ra
exit
!
interface enp196s0f1np1
 no ipv6 nd suppress-ra
exit
!
interface lo
 ipv6 address 2001:db8:100:1::/128
exit
!
router bgp 65101
 bgp router-id 1.2.3.4
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 neighbor upstream peer-group
 neighbor upstream remote-as external
 neighbor enp196s0f0np0 interface peer-group upstream
 neighbor enp196s0f1np1 interface peer-group upstream
 !
 address-family ipv6 unicast
  redistribute connected
  neighbor upstream activate
 exit-address-family
exit
!
end

On the Juniper switches a configuration was defined for the BGP Unnumbered (RFC5549) configuration as well. This blogpost explains very well on how BGP Unnumbered works on JunOS, I am not going to repeat it. I will highlight a couple of pieces of configuration.

root@tor01# show interfaces xe-0/0/1
description ceph01;
unit 0 {
    mtu 9216;
    family inet6;
}

root@tor01# show protocols router-advertisement 
interface xe-0/0/1.0;

root@tor01# show | compare 
[edit]
+  policy-options {
+      as-list bgp_unnumbered_as_list members 65101-65199;
+  }
[edit protocols]
+   bgp {
+       group ceph {
+           family inet6 {
+               unicast;
+           }
+           multipath;
+           export default-v6;
+           import ceph-loopback;
+           dynamic-neighbor bgp_unnumbered {
+               peer-auto-discovery {
+                   family inet6 {
+                       ipv6-nd;
+                   }
+                   interface xe-0/0/1.0;
+                   interface xe-0/0/2.0;
+                   interface xe-0/0/3.0;
+               }
+           }
+           peer-as-list bgp_unnumbered_as_list;
+       }
+   }
[edit policy]
+ policy-statement default-v6 {
+    from {
+        route-filter ::/0 exact;
+   }
+   then accept;
+}
+ policy-statement ceph-loopback {
+    from {
+        route-filter 2001:db8:100::/64 upto /128;
+   }
+   then accept;
+}

This will set up the BGP sessions via the interfaces xe-0/0/1 until xe-0/0/3 using IPv6 Autodiscovery.

The Ceph nodes should now be able to ping the other nodes:

PING 2001:db8:100::2(2001:db8:100::2) 56 data bytes
64 bytes from 2001:db8:100::2: icmp_seq=1 ttl=63 time=0.058 ms
64 bytes from 2001:db8:100::2: icmp_seq=2 ttl=63 time=0.063 ms
64 bytes from 2001:db8:100::2: icmp_seq=3 ttl=63 time=0.071 ms

--- 2001:db8:100::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2037ms
rtt min/avg/max/mdev = 0.058/0.064/0.071/0.005 ms

Ceph configuration

From Ceph’s perspective there is not much to do. We just need to specify the IPv6 subnet Ceph is allowed to use and bind to.

[global]
	 mon_host = 2001:db8:100::1, 2001:db8:100::2, 2001:db8:100::3
	 ms_bind_ipv4 = false
	 ms_bind_ipv6 = true
	 public_network = 2001:db8:100::/64

This is all the configuration needed for Ceph 🙂

wdh@ceph01:~$ sudo ceph health
HEALTH_OK
wdh@infra-04-01-17:~$ sudo ceph mon dump
election_strategy: 1
0: [v2:[2001:db8:100::1]:3300/0,v1:[2001:db8:100::1]:6789/0] mon.ceph01
1: [v2:[2001:db8:100::2]:3300/0,v1:[2001:db8:100::2]:6789/0] mon.ceph02
2: [v2:[2001:db8:100::3]:3300/0,v1:[2001:db8:100::3]:6789/0] mon.ceph02
dumped monmap epoch 6
wdh@infra-04-01-17:~$

Proxmox with BGP+EVPN+VXLAN

Proxmox by default does not support BGP+EVPN+VXLAN but there is a small piece of documentation on the Wiki of Proxmox.

Using the routing daemon Frrouting a Proxmox cluster can also be configured to use BGP with EVPN+VXLAN for it’s routing allowing for very flexible networks.

I won’t go into all the details of Frr, BGP, EVPN and VXLAN as the internet already more then enough resources about this. I’ll get right to it.

Frrouting

After installing Proxmox (7.1) on the node I temporarily connected the node via a ad-hoc connection to the internet to be able to install Frr.

curl -s https://deb.frrouting.org/frr/keys.asc | sudo apt-key add -
echo deb https://deb.frrouting.org/frr bullseye frr-7 | sudo tee -a /etc/apt/sources.list.d/frr.list
apt install frr frr-pythontools

After installing Frr I configured the /etc/frr/frr.conf config file:

frr version 7.5.1
frr defaults traditional
hostname infra-72-45-36
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
interface enp101s0f0np0
  no ipv6 nd suppress-ra
!
interface enp101s0f1np1
  no ipv6 nd suppress-ra
!
interface lo
  ip address 10.255.254.5/32
  ipv6 address 2a05:1500:xxx:xx::5/128
!
router bgp 4200400036
  bgp router-id 10.255.254.5
  no bgp ebgp-requires-policy
  no bgp default ipv4-unicast
  no bgp network import-check
  neighbor core peer-group
  neighbor core remote-as external
  neighbor core ebgp-multihop 255
  neighbor enp101s0f0np0 interface peer-group core
  neighbor enp101s0f1np1 interface peer-group core
  !
  address-family ipv4 unicast
    redistribute connected
    neighbor core activate
    exit-address-family
  !
  address-family ipv6 unicast
    redistribute connected
    neighbor core activate
    exit-address-family
  !
  address-family l2vpn evpn
    neighbor core activate
    advertise-all-vni
    exit-address-family
  !
line vty
!

In this case the host will be connecting to two Cumulus Linux routers using BGP Unnumbered.

The interfaces enp101s0f0np0 and enp101s0f1np1 are the uplinks of this node the the two routers.

/etc/network/interfaces

Now we need to make sure the /etc/network/interface file is populated with the proper information.

auto lo
iface lo inet loopback

auto enp101s0f1np1
iface enp101s0f1np1 inet manual
    mtu 9216

auto enp101s0f0np0
iface enp101s0f0np0 inet manual
    mtu 9216

This makes sure the interfaces (Mellanox ConnectX-5 2x25Gb SFP28) interfaces are online and running with an MTUof 9216.

The MTU of 9216 is needed so I can transport traffic with an MTU of 9000 within my VXLAN packets. VXLAN has an overhead of 50 bytes. So to transport an Ethernet packet of 1500 bytes you need to make sure you have at least an MTU of 1550 on your VXLAN underlay network.

VXLAN bridges

I now created a bunch of devices in the interfaces file:

auto vxlan201
iface vxlan201 inet static
    mtu 1500
    pre-up ip link add vxlan201 type vxlan id 201 dstport 4789 local 10.255.254.5 nolearning
    up ip link set vxlan201 up
    down ip link set vxlan201 down
    post-down ip link del vxlan201

auto vmbr201
iface vmbr201 inet manual
    bridge_ports vxlan201
    bridge-stp off
    bridge-fd 0

auto vxlan202
iface vxlan202 inet static
    mtu 1500
    pre-up ip link add vxlan202 type vxlan id 202 dstport 4789 local 10.255.254.5 nolearning
    up ip link set vxlan202 up
    down ip link set vxlan202 down
    post-down ip link del vxlan202

auto vmbr202
iface vmbr202 inet manual
    address 192.168.202.36/24
    bridge_ports vxlan202
    bridge-stp off
    bridge-fd 0

I created vmbr201 and vmbr202 which correspond to VNI 201 and 202 on the network. These can now be used with Proxmox to connect VMs to.

The IP-Address (10.255.254.5) set at the local argument of the ip command is the IP-address connected to the loopback interface and advertised using BGP.

This will be the address used for the VTEP in EVPN/VXLAN communication.

In reality however I have much more bridges

vmbr200
vmbr201
vmbr202
vmbr601
vmbr602
vmbr603
vmbr604

Proxmox cluster network

To be able to create a cluster with Proxmox you need a Layer2 network between the hosts where corosync can be used for cluster communication.

In this case I’m using vmbr202 which has IP-address 192.168.202.36/24 on this node. Other nodes in the cluster have a IPv4 address in the same network and allows them to communicate with the others.