AF_Packet Coudn't set fanout mode

There appears to be a kernel problem that prevents AF_Packet fanout mode. This appears to be causing a ton of ack/syn errors and problems.

https://marc.info/?l=oisf-users&m=141283437610157

The current Rocknsm seems to be at this kernel level which appears to have the bug.

BOOT_IMAGE=/vmlinuz-3.10.0-1062.1.2.el7.x86_64 root=UUID=76513ac8-96d4-4193-8c17-34bf6eee38a1 ro rhgb quiet LANG=en_US.UTF-8

I checked this on a current sensor with https://github.com/JustinAzoff/can-i-use-afpacket-fanout tool. It reports that issues with flows coming back on the wrong treads.

Any way we can get fixed kernel or a work around?

I am having trouble recreating the issue you are having. I am running a new build of ROCK version 2.4.0 on CentOS running kernel version 3.10.0-1062.1.2.el7.x86_64.

Which version of ROCK are you running? (cat /etc/rocknsm/rock-version)

To follow up on @kwbyron, the 3.10.x kernel from CentOS since like 7.1 supports AF_Packet Fanout, else we wouldn’t use that as our standard capture mechanism. Can you share the output of JustinAzoff’s tool and your current output from uname -a? And lastly, how did you install (what version)?

I re-ran the test script on another Sensor that is configured nearly identically. (1 10gb nic instead of 2) and it also reports fails. (I removed all our IP’s from the log.) Here is the kernel version on this sensor also .

uname -a
Linux hl-mtn-ids01.coxhealth.com 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I’m going to test the previous sensor that was failing with a different OS version / Kernel and see if the same hardware still fails the test script or not. That should isolate the hardware from the results. Or confirm its happened due to something other than the AF_Packet Fanout.
So more background on this. I was attempting to enable Cluster_qm mode and it would fail to start with an error that fanout could not start. IF I changed # of threads to 1 then it would run. I believe setting it to 1 thread bypasses the fanout_hash mode.

TIA!

sudo ~/go/bin/can-i-use-afpacket-fanout -interface p2p2 -maxflows 100
x.x.x.x 13:04:47 Starting worker id 0 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 1 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 2 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 3 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 4 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 5 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 6 on interface p2p2
x.x.x.x 13:04:47 Starting worker id 7 on interface p2p2
x.x.x.x x.x.x.x workers started. Collecting results until 100 flows have been seen…
x.x.x.x 13:04:47 Worker 3 has seen at least 100 packets, collecting data
x.x.x.x 13:04:47 Worker 0 has seen at least 100 packets, collecting data
x.x.x.x 13:04:47 FAIL: saw flow {x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:47 FAIL: saw flow {x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 64165 x.x.x.x 137} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 64165 x.x.x.x 137} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 Worker 2 has seen at least 100 packets, collecting data
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 55858} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow { x.x.x.x x.x.x.x } on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow { x.x.x.x x.x.x.x } on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 14294 x.x.x.x 444} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 500 x.x.x.x 500} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 57984} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 445 x.x.x.x 64431} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 60680 x.x.x.x 3389} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 444 x.x.x.x 14294} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 57459} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 445 x.x.x.x 1919} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 500 x.x.x.x 500} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 42899 x.x.x.x 443} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 42899 x.x.x.x 443} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 445 x.x.x.x 1919} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 9093 x.x.x.x 49206} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 9093 x.x.x.x 49206} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 38235 x.x.x.x 53} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 9093 x.x.x.x 49206} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 9093 x.x.x.x 49206} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 64165 x.x.x.x 137} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 443 x.x.x.x 42899} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 443 x.x.x.x 42899} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 35030 x.x.x.x 6475} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 35030 x.x.x.x 6475} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 49206 x.x.x.x 9093} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw reverse flow of {TCP x.x.x.x 49206 x.x.x.x 9093} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 57984} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 27314} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 445 x.x.x.x 1919} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 27314} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 445 x.x.x.x 1919} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 27314} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 50799 x.x.x.x 59096} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 27314} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 60680 x.x.x.x 3389} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 14200 x.x.x.x 443} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 64165 x.x.x.x 137} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 444 x.x.x.x 24688} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 43451 x.x.x.x 443} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 49764} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 2598 x.x.x.x 60324} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 6333 x.x.x.x 443} on worker 2 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 28800 x.x.x.x 9031} on worker 3 expected 0
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 57131 x.x.x.x 9031} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 57131 x.x.x.x 9031} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 40422 x.x.x.x 104} on worker 2 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {UDP x.x.x.x 64165 x.x.x.x 137} on worker 0 expected 3
x.x.x.x 13:04:48 FAIL: saw flow {TCP x.x.x.x 42950 x.x.x.x 636} on worker 2 expected 0

x.x.x.x 13:04:48 Final Stats: packets=864 flows=101 success_flows=7 failed_flows=78 pkt_success=291 pkt_reverse_success=124 pkt_failures=472 pkt_reverse_failures=221
x.x.x.x 13:04:48 Worker flow count distribution:
x.x.x.x 13:04:48 - worker=0 flows=12
x.x.x.x 13:04:48 - worker=1 flows=0
x.x.x.x 13:04:48 - worker=2 flows=0
x.x.x.x 13:04:48 - worker=3 flows=89
x.x.x.x 13:04:48 - worker=4 flows=0
x.x.x.x 13:04:48 - worker=5 flows=0
x.x.x.x 13:04:48 - worker=6 flows=0
x.x.x.x 13:04:48 - worker=7 flows=0

Thanks for that additional info. Just a couple more questions to help us narrow down what might be failing here.

  1. What is contents of /etc/rocknsm/rocknsm-buildstamp?
  2. What is the output of sudo ethtool -i p2p2? This identifies the driver and some features of the NIC.

cat /etc/rocknsm/rocknsm-buildstamp
[Main]
Product=RockNSM
Version=2.4.2-1905
BugURL=https://github.com/rocknsm/rock/issues/
IsFinal=True
UUID=201711080000.x86_64
Build=20190510-1626
[Compose]
Lorax=19.6.78-1

sudo ethtool -i p2p2 {Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)}
driver: ixgbe
version: 5.1.0-k-rh7.6
firmware-version: 0x8000095d, 18.5.18
expansion-rom-version:
bus-info: 0000:3b:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Here is the error that started me down this rabbit hole. This happens if I try to set “cluster-type: cluster_qm” in the rocknsm-overrides.yaml. If I set threads per nic to 1 this error goes away

Oct 25 14:59:30 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:30 - - Configuration node ‘default-log-dir’ redefined.
Oct 25 14:59:30 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:30 - - Configuration node ‘outputs’ redefined.
Oct 25 14:59:30 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:30 - - This is Suricata version 4.1.4 RELEASE
Oct 25 14:59:39 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:39 - - all 64 packet processing threads, 4 management threads initialized, engine started.
Oct 25 14:59:39 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:39 - - [ERRCODE: SC_ERR_AFP_CREATE(190)] - Couldn’t set fanout mode, error Invalid argument
Oct 25 14:59:39 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:39 - - [ERRCODE: SC_ERR_AFP_CREATE(190)] - Couldn’t init AF_PACKET socket, fatal error
Oct 25 14:59:39 hl-mtn-ids01.coxhealth.com suricata[278096]: 25/10/2019 – 14:59:39 - - [ERRCODE: SC_ERR_FATAL(171)] - thread W#01-p2p1 failed
Oct 25 14:59:40 hl-mtn-ids01.coxhealth.com systemd[1]: suricata.service: main process exited, code=exited, status=1/FAILURE
Oct 25 14:59:40 hl-mtn-ids01.coxhealth.com systemd[1]: Unit suricata.service entered failed state.
Oct 25 14:59:40 hl-mtn-ids01.coxhealth.com systemd[1]: suricata.service failed.

After rebuilding one of the sensors on Ubuntu 18.04 We’ve found the issue is still present. This appears to be an issue external to the OS. The same hardware is failing on two different OS’s and Kernel versions. Both versions should not have this issue.

Ubuntu 18.04
Linux hl-bbu-ids01 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Any ideas?

So, the RSS queues are very hardware dependent and require some tweaking with a custom hash function that might not work on Intel X520 cards, but I don’t know specifically.

Michal Purzynski from Mozilla has probably done some of the most significant work in this area. He co-presented at SuriCon 17 on it. It’s been a while since I looked at setting up RSS hashing, but as Michal presents, it requires IRQ alignment with pinned CPUs and NUMA nodes.

But, none of that is necessary for the default FANOUT mode, as that hashing happens in kernel space, not on the card. I’ve used X520 cards to great success with that, though X710s are better due to the larger on-hardware queues. IRQ alignment and all that will still improve performance, however.

The other thing you have to be careful for with AF_Packet is that each process on the system using AF_Packet must have a unique cluster-id value. The kernel uses this to track the packet queues and consumer groups.

Sorry I don’t have a specific “do this” fix. If you have a different capture card you could try, you could at least rule that out. Please share if you figure it out though!

Thanks, I’ll take a look at that. IRQ pinning wouldn’t be a bad thing to do either way. It may be “luck of the draw” but I seem to have Cluster_QM working right now under Ubuntu. I’m not sure why I can’t get it to work on ROCKNSM with Centos. I’m still digging on the issue. I’ll update here if I find more info.