RockNSM crashing/locking up

We are sending a lot of GRE Encapsulated Data to the RockNSM Monitoring Interface, over time RockNSM will crash and lockup, only fix action is to reboot. Is the reason for this that RockNSM can not handle all the GRE encapsulated data being pushed to it at once, is there any modification I can make to ease the overload of data coming in?


Is this the same as the other box that you were causing to freeze with Elasticsearch? It sounds like a resource issue. GRE, in general, requires more resources to process in both Zeek and Suricata. That’s because both of those tools will maintain a session for a good long while for any outer tunnel.

If you’re using GRE to tunnel PCAP (like in ERSPAN) and you don’t need that info in the sensor data itself, then it’d be best to strip it off or terminate it before it hits the monitor interface. If you need that metadata in Zeek/Suricata, then I think you can force a session timeout after some period. That comes with its own caveats.

Yes, same box. I have thrown a ton of resources at it, I can throw some more, just wanted to know if there might be any tuning I could do from an application wise to help out.

I just rebuilt it with 16 CPU’s and 96GB of Ram, previously I had 8 CPU’s and 64GB of Ram.

Thanks for all the help…much appreciated.

There’s a lot of reasons in general that a Linux box could “lock up”, so it’s really hard to help narrow this down. Typically, the most severe lock up that isn’t recoverable is a kernel panic. If that happens, you’d only probably see it on the physical console of the system. Short of that, running out of memory, but even then the scheduler will come by and start killing processes until it can recover.

How much bandwidth are you monitoring?

Thanks. I was looking at the logs and didn’t see any kernel panics, I did see the HeapMemoryOverload Error. I will check with my network guy and get an answer for how much bandwidth we are monitoring.

We have a 1GB Pipe and are monitoring 150 machines on 1 Virtual Distributed Switch, if that helps with the bandwidth question.