Calculating DS record from DNSKEY with Python 3

While working on DNSSEC for PCextreme’s Aurora DNS I had to convert a DNSKEY to a DS-record which could be set in the parent zone for proper delegation.

The foundation for Aurora DNS is PowerDNS together with Python 3.

The API for Aurora DNS has to return the DS-records so that a end-user can use these in the parent zone. I had the DNSKEY, but I didn’t have the DS-record so I had to calculate it using Python 3.

I eventually ended up with this Python code which you can find on my Github Gists page.

"""
Generate a DNSSEC DS record based on the incoming DNSKEY record

The DNSKEY can be found using for example 'dig':

$ dig DNSKEY secure.widodh.nl

The output can then be parsed with the following code to generate a DS record
for in the parent DNS zone

Author: Wido den Hollander 

Many thanks to this blogpost: https://www.v13.gr/blog/?p=239
"""

import struct
import base64
import hashlib


DOMAIN = 'secure.widodh.nl'
DNSKEY = '257 3 8 AwEAAckZ+lfb0j6aHBW5AanV5A0V0IfF99vAKFZd6+fJfEChpZtjnItWDnJLPa3/LAFec/tUhLZ4jgmzaoEuX3EQQgI1V4kp9SYf8HMlFPP014eO+AnjkYFGLE2uqHPx/Tu7/pO3EyKwTXi5fMadROKuo/mfat5AEIhGjteGGO93DhnOa6kcqj5RHYJBh5OZ/GoZfbeYHK6Muur1T16hHiI12rYGoqJ6ZW5+njYprG6qwp6TZXxJyE7wF1JdD+Zhbjhf0Md4zMEysP22wBLghBaX6eDIBh/7jU7dw1Ob+I42YWk+X4NSiU3sRYPaq1R13JEK4zVqQtL++UVtgRPEbfj5RQ8='


def _calc_keyid(flags, protocol, algorithm, dnskey):
    st = struct.pack('!HBB', int(flags), int(protocol), int(algorithm))
    st += base64.b64decode(dnskey)

    cnt = 0
    for idx in range(len(st)):
        s = struct.unpack('B', st[idx:idx+1])[0]
        if (idx % 2) == 0:
            cnt += s << 8
        else:
            cnt += s

    return ((cnt & 0xFFFF) + (cnt >> 16)) & 0xFFFF


def _calc_ds(domain, flags, protocol, algorithm, dnskey):
    if domain.endswith('.') is False:
        domain += '.'

    signature = bytes()
    for i in domain.split('.'):
        signature += struct.pack('B', len(i)) + i.encode()

    signature += struct.pack('!HBB', int(flags), int(protocol), int(algorithm))
    signature += base64.b64decode(dnskey)

    return {
        'sha1':    hashlib.sha1(signature).hexdigest().upper(),
        'sha256':  hashlib.sha256(signature).hexdigest().upper(),
    }


def dnskey_to_ds(domain, dnskey):
    dnskeylist = dnskey.split(' ', 3)

    flags = dnskeylist[0]
    protocol = dnskeylist[1]
    algorithm = dnskeylist[2]
    key = dnskeylist[3].replace(' ', '')

    keyid = _calc_keyid(flags, protocol, algorithm, key)
    ds = _calc_ds(domain, flags, protocol, algorithm, key)

    ret = list()
    ret.append(str(keyid) + ' ' + str(algorithm) + ' ' + str(1) + ' '
               + ds['sha1'].lower())
    ret.append(str(keyid) + ' ' + str(algorithm) + ' ' + str(2) + ' '
               + ds['sha256'].lower())
    return ret


print(dnskey_to_ds(DOMAIN, DNSKEY))

Slow requests with Ceph: ‘waiting for rw locks’

Slow requests in Ceph

When a I/O operating inside Ceph is taking more than X seconds, which is 30 by default, it will be logged as a slow request.

This is to show you as a admin that something is wrong inside the cluster and you have to take action.

Origin of slow requests

Slow requests can happen for multiple reasons. It can be slow disks, network connections or high load on machines.

If a OSD has slow requests you can log on to the machine and see what Ops are blocking:

ceph daemon osd.X dump_ops_in_flight

waiting for rw locks

Yesterday I got my hands on a Ceph cluster which had a very high number, over 2k, of slow requests.

On all OSDs they showed ‘waiting for rw locks’

This is hard to diagnose and it was. Usually this is where OSDs are busy connecting to other OSDs or performing any other network actions.

Usually when you see ‘waiting for rw locks’ there is something wrong with the network.

The network

In this case the Ceph cluster is connecting over Layer 2 and that network didn’t change. A few hours earlier there was a change to the Layer 3 network, but since Ceph was running over Layer 2 we didn’t connect the two dots.

After some more searching we noticed that the hosts couldn’t perform DNS lookups properly.

DNS

Ceph doesn’t use DNS internally, but it could still be that it was a problem.

After some searching we found that DNS wasn’t the problem, but there were two default routes on the system where one was down.

Layer 3

This Ceph cluster is communicating over Layer 3 and the problem was caused by the fact that the cluster had a hard time talking back to various clients.

This caused various network buffers to fill up and that caused communication problems between OSDs.

So always make sure you double-check the network since that is usually the root-cause.