Combined Charging System (CCS) and IPv6

When we talk about IP protocols and IPv6 in this particular case we think about routing over the internet. But what many do not know is that IPv6 is already playing a major role in all kinds of systems.

In this post I’m not going to explain IPv6 in detail, for many things I’ll link to external pages.

Ad-Hoc networking using Link-Local

IPv6 has a great feature where each host on a network will generate a Link Local Address (LL) which is mandatory according to the protocol.

The great thing about LL is that it allows hosts to establish communication without the need of DHCP or anything else. Just plug in a host on a Layer 2 Ethernet network and they can start communicating right away.

fe80::5054:ff:fe98:aaee is an example of a IPv6 LL address and using this address it can search for (via multicast) and communicate with other hosts in the network.

Examples of systems using IPv6 LL are Matter and Combined Charging System (CCS).

Combined Charging System (CCS)

CCS is a standard for (fast charging) electric vehicles. Some years ago I stumbled upon multiple documents showing that CCS would be using a combination of PLC, IPv6, TCP/IP, UDP, HTTP, TLS and many other very common protocols for communication.

Using PowerLine Communication (PLC) the vehicle and charger will establish a Layer 2 Ethernet network over which they will communicate using IPv6 LL.

I was not able to find the exact details of the IPv6 part of the CCS standard, but it is very interesting to see that IPv6 is being used for something like charging electric vehicles.

Kudos to the people behind CCS and for using IPv6 for this!

In my opinion it’s a no-brainer to use IPv6 for such functionality as the LL addressing allow you to create networks very quickly and in a reliable manner.

You can also use a very well documented protocol and the many tools for IPv6.

Now I would like to run Wireshark on such a link!

If you have more information about this, please contact me!

Babycam using a Unifi G3 camera

It’s been almost a year that I’ve been living in my new house and proud father of a son.

Almost every parent wants to watch their little on through a camera to see them sleeping in their bed.

There are many, many, many baby monitors on the market but in my experience they are all crap. Our house is build with a lot of concrete and none of them was able to penetrate the walls in our house and thus have a reliable video connection.

I also tried some IP/WiFi based baby monitors from for example Foscam, but these are all cheap Chinese cameras and didn’t work either. Crashing iPad apps, sending random (UDP) packets to China, half-baked UIs, etc, etc. They were all very low quality.

After a lot of frustrating I bought a Unifi Video G3 (UVC-G3-BULLET) camera and mounted it on the bed of my son.

Unifi Video G3 camera mounted on baby bed

I am using multiple Unifi video cameras around my house using Unifi Video, so for me it was just a matter of adding an additional camera to my Unifi system.

Video camera in Unifi Video

RTMP + HLS with Nginx

I also experimented with building a video stream using RTMP, ffmpeg and HLS with Nginx.

The Unifi cameras can export a RTSP stream which you can set up to live stream with Nginx. I tried this and it works for me, but with a delay of ~20s I thought it was not usable on my use-case.

It does however allow for a very easy way of building your own webcam!

Linux mdadm software RAID vs Broadcom MegaRAID 9560

In the past years I’ve said goodbye to hardware RAID controllers and mainly relied on software solutions like mdadm, LVM, Ceph and ZFS(-on-Linux) for keeping data safe.

At PCextreme we use hypervisors with local NVMe storage running in Linux’s mdadm software RAID-10. This works great! But I wasn’t satisfied with the performance for a few reasons:

  • It is expensive on the CPUs (Dual AMD Epyc 48-core)
  • It’s not super fast

We mainly use the Samsumg PM983 (1.92TB) devices and I started to look around if there is a hardware solution which could offload the RAID computation to a dedicated SoC so it wouldn’t eat up our CPU cycles.

After searching I found the Broadcom SAS3916 chip which is on the MegaRAID 9516-16i controller from Broadcom. This chipset supports NVMe devices in various RAID modes.

I wanted to benchmark Linux’s software RAID against the Broadcom controller to see if it would be faster and save us the expensive CPU cycles.

With mdadm we also looked into RAID-5/6 to have more usable space. We however found out that this eats up so many CPU cycles that it wasn’t feasible to use in production for our purposes.

Benchmarking setup

  • Ubuntu Linux 18.04 with kernel 5.3
  • SuperMicro 1114S-WN10RT
  • AMD Epyc 7302P 16-core CPU
  • 128GB Memory
  • 4x Samsung PM983 1.92TB
  • Broadcom MegaRAID 9516-i

Benchmarking will be done using fio and the main elements we are looking for:

  • QD=1, 4, 8 and 16 4k random write performance
  • CPU utilization during benchmarks

MegaRAID configuration

The RAID-10 array was created using storcli

storcli64 /c0 add vd type=raid10 drives=252:4,6,8,10 pdperarray=2

mdadadm setup

RAID-10 was set-up using this command:

mdadm --create --level=10 --raid-devices=4 /dev/md0 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1

fio

The fio jobs we ran to benchmark:

[global]
ioengine=libaio
direct=1
invalidate=1
bs=4k
runtime=300
filename=/var/lib/libvirt/images/fio
size=64g
rw=randwrite
numjobs=1

[rw_mdadm_1]
iodepth=1

[rw_mdadm_4]
iodepth=4

[rw_mdadm_8]
iodepth=8

[rw_mdadm_16]
iodepth=16

[rw_mdadm_32]
iodepth=32

Results

Short answer? The MegaRAID controller is much faster in RAID-10 mode and also saves up to 30% of CPU cycles on the Linux machine.

You can also download the JSON output from fio here:

In addition you can download my Open Office spreadsheet I used to generate the graphs and process the data.

Graphs

Some graphs with the results of the IOps and CPU utilization.

At QD 16~32 we seem to have found the upper limit of what the 4 NVMe devices are capable of. Going from QD 16 to 32 does not make a significant difference.
At QD32 we can still see that the MegaRAID controller uses a lot less CPU cycles then Linux’s software RAID

1Gbit fiber connection between MikroTik and Unifi switch

While replacing my router at home by a MikroTik CCR1036-8G-2S+ router I also wanted to uplink my Ubiquity switch using a Multimode (OM3) connection.

Is fiber really needed? Not really, but it saved me an additional RJ45 port on my switch which allows me to connect more.

Link up, down, up, down

The link between my MikroTik router and Unifi switch kept going up and down. In the logs on my MikroTik and Unifi switch I saw:

<14> May 26 19:19:07 SwitchPatchkast DOT1S[dot1s_task]: dot1s_sm.c(314) 454679 %% Port (26) inst(0) role changing from ROLE_DESIGNATED to ROLE_DISABLED
<14> May 26 19:19:07 SwitchPatchkast DOT1S[dot1s_task]: dot1s_sm.c(314) 454677 %% Port (26) inst(0) role changing from ROLE_DISABLED to ROLE_DESIGNATED
<13> May 26 19:19:07 SwitchPatchkast TRAPMGR[trapTask]: traputil.c(743) 454676 %% Link Up: 0/26
<14> May 26 19:18:58 SwitchPatchkast DOT1S[dot1s_task]: dot1s_sm.c(314) 454668 %% Port (26) inst(0) role changing from ROLE_DESIGNATED to ROLE_DISABLED
<13> May 26 19:18:58 SwitchPatchkast TRAPMGR[trapTask]: traputil.c(743) 454667 %% Link Down: 0/26
19:25:12 interface,info sfp-sfpplus1 link up (speed 1G, full duplex)
19:25:20 interface,info sfp-sfpplus1 link down
19:25:21 interface,info sfp-sfpplus1 link up (speed 1G, full duplex)
19:25:29 interface,info sfp-sfpplus1 link down
19:25:30 interface,info sfp-sfpplus1 link up (speed 1G, full duplex)
19:28:48 interface,info sfp-sfpplus1 link down

I am using optics from FlexOptix which I programmed to MikroTik and Ubiquity using their programmer. (We have those at work).

Auto negotiation

After trying many things it turned out that turning off Auto Negotation on both the switch and the router resolved the issue.

On the switch I turned it off via the UI of the Unifi Controller and forced it to 1000 FDX.

On the router I turned it off using the MikroTik CLI:

[admin@router] > /interface ethernet export
/interface ethernet
set [ find default-name=sfp-sfpplus1 ] advertise=1000M-full auto-negotiation=no speed=1Gbps
set [ find default-name=sfp-sfpplus2 ] advertise=1000M-full speed=1Gbps
[admin@router] >

sfp-sfpplus1 is the interface I am using for the connection to my Unifi switch.

Link is now online! Below is a picture of my 19″ rack at home.

Building Ceph .deb packages

Once in a while I have to build Ubuntu packages (.deb) for Ceph to test some of the Pull Requests I write.

I always forget on how to build these packages and probably other people do so as well.

I build Ceph on my laptop using a Docker container with a few commands.

docker run -v /home/wido/repos/ceph:/usr/src/ceph -ti ubuntu:18.04 bash

I have the Ceph repository checked out locally in /home/wido/repos/ceph which I then map into the Docker container.

Once in the container it’s fairly simple:

apt update
apt -y install dpkg-dev
cd /usr/src/ceph
dpkg-buildpackage -us -uc -b -j 4

This will now yield a list of packages you need to install. Once you’ve installed them:

./do_cmake.sh
dpkg-buildpackage -us -uc -b -j 4

Now be ready to wait for a long time as this can take 2 hours as it does on my laptop.

After the build succeeds you’ll find the .deb files in /usr/src

Exploring the CAN bus of my Tesla Model S

The CAN bus of a Tesla vehicle can show some interesting information about the state of different components in the vehicle.

Using a CAN bus cable, Bluetooth adapter and a App on your mobile phone you can gain much more insight on your Tesla vehicle.

I own two Tesla vehicles:

  • S85 from September 2013 (pre face-lift)
  • S100D from September 2018

Somewhere around 2015 Tesla switched to a different connector for the CAN bus so I needed two different cables. I bought my cables in Germany at EMDS.

EMDS also sells a cable for Model 3. I haven’t used this one as I don’t own a Model 3.

The CAN bus connector in a Model S can be found under the MCU’s main screen in the vehicle. You need to pull down the ‘chubby’ and there you will find the connector:

Cable connected to my Model S85

I am using the TM-Spy app on iOS for reading the values on my iPhone.

Screenshot of TM-Spy on iOS

For Android there is Scan My Tesla which also seems to be a very good app. I don’t have an Android device, so I was not able to test it.

I was mainly looking for these values:

  • Usable Full
  • DC Charge Total
  • AC Charge Total

After 253.543km of driving my battery has 75.6kWh of remaining capacity where this was ~81kWh when it was new. (The 85kWh battery was actually a 81kWh battery….)

Tesla also throttles a vehicle’s SuperCharging capabilities after more than X amount (I don’t know the exact value) of DC charging. My car seems to be affected as I SuperCharged a lot.

Charge Total is not a total sum of AC+DC, but from what I’ve read early firmwares did not count AC and DC charging in different values.

Interesting information though! I encourage everybody to use this information to gather more information about their vehicle’s state.

Happy exploring!

Replacing the eMMC in my Tesla Model S

Tesla’s vehicles are awesome. I own a S85 from 2013 and a S100D from 2018. I’ve driven 260.000km and 70.000km with these two vehicles and I love it.

There is however a design flaw in the early Tesla models which can become a very expensive reparation if not performed in time.

Version of of the MCU (Media Control Unit) which was installed up until early 2018 in Tesla S/X runs and is a ticking time bomb.

The problem is the Flash Memory (eMMC) which holds the Operating System of the computer. This wears out over time due to writing data to it.

Writing happens when you use the car. It caches Spotify, Google Maps and many more things. Even when the car is parked the MCU stays running and writes to the eMMC chip.

Eventually this chip will wear out. Before it does it becomes very slow and this results in the MCU becoming super sluggish, unresponsive, the screen reboots at random moments, bluetooth issues, etc, etc.

Inside of the Tesla MCU1
The eMMC chip

A lot has been written about this, so I won’t write to much. Short: Tesla will charge you around 2000 EUR/USD for a new MCU.

I choose to replace this eMMC chip myself and this was a lot cheaper! Total cost was <500 EUR.

Around the world there are a couple of companies doing these replacements. I replaced my eMMC chip together with Loek from Laadkabelwinke.nl (Netherlands): https://www.laadkabelwinkel.nl/tesla-mcu1-emmc-vervangen-reparatie

On Tesla Motors Club there are various topics about these replacements, one for example: https://teslamotorsclub.com/tmc/threads/preventive-emmc-replacement-on-mcu1.152489/

I highly recommend everybody with a Model S/X to replace this eMMC chip before it fails.

The chip will wear out and this causes all kinds of problems. I can’t stress this enough. Replace the chip before it’s too late!

In Europe I would recommend to go to Laadkabelwinkel.nl and have them replace the chip.

Enjoy your Tesla!

Creating a Management Routing Instance (VRF) on Juniper QFX5100

For a Ceph cluster I have two Juniper QFX5100 switches running as a Virtual Chassis.

This Virtual Chassis is currently only performing L2 forwarding, but I want to move this to a L3 setup where the QFX switches use Dynamic Routing (BGP) and thus become the gateway(s) for the Ceph servers.

This should work, but one of the things I was missing is a dedicated Management Port which uses a different routing table/instance.

Starting with JunOS 17.3R1 you can create a Management Routing Instance as described on the website of Juniper.

set system management-instance

This now creates the Routing Instance called mgmt_junos.

I try to run as much as possible IPv6-only or at least prefer IPv6 over IPv4.

I ran into the problem that configuring an IPv6 address on my em0 interface just wouldn’t work. It kept saying that the IPv6 address was Duplicate.

This is probably something which happens because both QFX switches are connected to the same Out of Band switch and causes it to receive it’s DAD over a different link. I had to disable DAD on interface em0 to make it work.

In addition I configured all DNS lookups to be performed using this routing instance.

The end result for my configuration (snippets):

system {
management-instance;
name-server {
2a00:f10:ff04:153::53 routing-instance mgmt_junos;
2a00:f10:ff04:253::53 routing-instance mgmt_junos;
93.180.70.22 routing-instance mgmt_junos;
93.180.70.30 routing-instance mgmt_junos;
}
}
interfaces {
unit 0 {
family inet {
address 172.17.5.10/24;
}
family inet6 {
address 2a00:f10:XXX:XXX::100/64
dad-disable;
}
}
}
routing-instances {
mgmt_junos {
routing-options {
rib mgmt_junos.inet6.0 {
static {
route ::/0 next-hop 2a00:f10:XXX:XXX::1;
}
}
static {
route 0.0.0.0/0 next-hop 172.17.5.1;
}
}
}
}

This now allows me to SSH to my Juniper QFX Virtual Chassis over interface em0 which uses a different routing instance/table.

Should I make a mistake in the default routing instance, for example a BGP misconfiguration, I can still SSH to my switch(es).

Or if there is a routing error (BGP issue) I can also still reach the switches.

Allowing SSH login for user without a password

To start with: This is something you should NOT use in most cases. It’s only intended to be used in very specific situations.

In my situation I want to allow some remote systems to create a reverse SSH tunnel without a password nor a key. It’s for hobby purposes and through firewalling I make sure that only those systems are allowed to connect to my ‘SSH proxy’.

I started by creating a group and a few users with that as their primary group:

groupadd reversessh
useradd -G reversessh user1
useradd -G reversessh user2
useradd -G reversessh user3
passwd -d user1
passwd -d user2
passwd -d user3

I then modified my /etc/ssh/sshd_config that it only allows specific groups and allows users with an empty password:

PermitEmptyPasswords yes
AllowGroups root reversessh

I also needed to modify PAM to make sure it allows this login. Therefor you need to modify /etc/pam.d/common-auth that it contains:

auth    [success=1 default=ignore]      pam_unix.so nullok

After I restarted SSH to users user1 until user3 were able to log on without a password nor a key.

Is this very secure? No! But it does serve a purpose in some use-cases.

Setting noout flag per Ceph OSD

Prior to Ceph Luminous you could only set the noout flag cluster-wide which means that none of your OSDs will be marked as out.

On large(r) cluster this isn’t always what you want as you might be performing maintenance on a part of the cluster, but you sill want other OSDs which go down to be marked as out.

You can however also set the noout flag per OSD basis using this command:

ceph osd add-noout 0

This means that osd.0 will not be marked as out. You can verify this by looking at the OSDMap:

root@alpha:~# ceph osd dump|grep osd.0
osd.0 up in weight 1 up_from 5 up_thru 0 down_at 0 last_clean_interval [0,0) [v2:[2001:db8::100]:6800/1618,v1:[2001:db8::100]:6801/1618,v2:0.0.0.0:6802/1618,v1:0.0.0.0:6803/1618] [v2:[2001:db8::100]:6804/1618,v1:[2001:db8::100]:6805/1618,v2:0.0.0.0:6806/1618,v1:0.0.0.0:6807/1618] exists,noout,up c0c3e5c3-918b-4f3e-a48c-ea8e7c014a3b
root@alpha:~#

Here you can see the noout flag has been set for osd.0.

Removing the flag for this OSD is the reverse process:

ceph osd rm-noout 0

Other flags you can set per osd:

  • nodown
  • noup
  • noin
  • noout

Use this to your advance!