Multi Node Issue

I am attempting to setup RockNSM in multinode, I have follow the directions on the site and also have made the change to the file noted here: https://github.com/rocknsm/rock/issues/447 but I keep running into the same issues, where it continues to check to see if the cluster is green and uses all 300 attempts and never comes up. No other errors are showing that I can see.

I am testing this within VMWare with 3 nodes all on the same subnet.

My set up will eventually be a little bit different. I have a need for sensors outside of my network and on different network, that push data back to a master node that would receive the data from the sensors and place it into elastic for users to log into and view through Kibana. Connectivity to the remote networks is not an issue. Has any one tried this, or done it successfully?

Thanks

@gmahns thanks for posting. Incidentally I ran into this same issue yesterday and working on getting the fix pushed up in an update (will be 2.4.2-1905).

As for the setup, I was thinking about this and there’s some architecture decisions that will drive how it all works.

  1. Are the sensors remote (like different data centers)?
  2. Do you have sensors on different isolated networks?
  3. Should users on the remote networks be able to access their data locally or only through the central data store?

The default documentation doesn’t handle the case where you have distinct sensor nodes, but rather having multiple sensors in the same datacenter. That said, it should just be a tweak of the inventory with an additional group for each site. That group can override a couple variables that should make this all work.

Choose your own adventure. Lemme know what that architecture looks like and we can smooth this out. The updated ISO should drop today.

@decode Thanks for your response and glad it wasnt just me!

To answer your question about architecture the sensors are remote on different networks, those report back to the master server where all the data resides.

Users should not be able to access the sensors, only admins for updates and configs. Local users can log in locally here to view the data that is collected.

Thanks for you and everyone else’s work on this project, cant wait to try out the new ISO!

Hi dcode,

Has this issue been fixed? I’m using rocknsm-2.4.2-1905 but still getting the same problem. Cluster is not green after 300 attempts.

Thank you.

what does your /etc/elasticsearch/elasticsearch.yml file look like? I have ran into issues in the past where this needed some adjusting for multinode setups. Usually I am looking at the section containing:

discovery.zen.ping.unicast.hosts: ["es1.node.lan","es2.node.lan", "es3.node.lan"]

Hi koelslaw,

I did a fresh install and started multi node deployment without much changes to the configuration. My elasticsearch.yml does not have the discovery.zen.ping.unicast.hosts. Instead, it has the following:

discovery.seed_hosts:
  - es1.node.lan
  - es2.node.lan
  - es3.node.lan
cluster.initial_master_nodes:
  - es1.node.lan
  - es2.node.lan
  - es3.node.lan

Anyway, I added in your line and was presented with a different error:

TASK [elasticsearch : Disable cluster shard allocation] ************************
fatal: [es1.node.lan]: FAILED! => {"msg": "The conditional check 'result.json.acknowledged | bool' failed. The error was: error while evaluating conditional (result.json.acknowledged | bool): 'dict object' has no attribute 'json'"}

Btw, is there a more comprehensive documentation on how to get multi node deployment working from a fresh installation? eg: What else need to be changed that is not mentioned in https://docs.rocknsm.io/deploy/multi-node/ ?

Thank you.

Here is a rough draft of the multi node deployment that I have used elsewhere.

So you can do this online also but for the sake of giving to a few options I included how to do this offline. Choose your own adventure. Either way will work. So…

  1. Make a place for the rock iso to live on each system.
sudo mkdir -p /srv/rocknsm
  1. Deploy the Rock iso to all the the machines to prepare for the deployment of Rock NSM

sudo mount -o loop /home/admin/rocknsm-2.x.X.-XXXX.iso /mnt

  1. Navigate to the rock bin directory
cd /srv/rocknsm/
  1. Copy the contents of the mounted iso so that it can be used for the installation
sudo cp -r /mnt/* /srv/rocknsm

Sync the Clocks across all machines (Because time travel really makes things difficult :slight_smile: )

To this we set the sensor as the authority for time for the rest of the kit. We do this for a couple of reasons the biggest being that its is where the time-based data is generated from zeek(Bro), FSF, and Suricata. Aligning the rest of the stack along this guideline keeps us from writing events in the future. all events should be written in UTC to help with response across timezones. This is done via chrony.

  1. If you have any time-based services running turn them off. Otherwise continue if this a new installation as we have not deployed ROCK yet.

  2. If not already installed then install chrony

sudo yum install chrony
  1. Edit the config file with vi
sudo vi /etc/chrony.conf
  1. Time Server (Likely Sensor) Uncomment/edit the following line in the the /etc/chrony.conf
allow 10.44.10.0/24  
  1. Add ntp to the firewall on sensor
sudo firewall-cmd --add-service=ntp --zone=work --permanent
  1. Reload the firewall
sudo firewall-cmd --reload
  1. Time Cleint (Everything not the Sensor Server) Uncomment all the time servers and point it to sensor.lan or the IP address.
server 192.0.2.1 iburst
  1. Start and enable the service.
sudo systemctl enable --now chronyd
  1. Add ntp to the firewall on sensor
sudo firewall-cmd --add-service=ntp --zone=work --permanent
  1. Reload the firewall
sudo firewall-cmd --reload
  1. Verify on all the applicable clients that they can talk to the server for time.
 chronyc sources

NOTE: Time sync may not happen immediately.

Deployment of Rock across All Machines

Generate a hosts.ini file that so ansible knows where to deploy things sudo vi /etc/rocknsm/hosts.ini

NOTE: If not already done then log into every server that rock will be deployed to so that the key can be added to the ssh hosts file.

  1. Insert the following text. These will tell the script what to deploy and where
sensor.lan ansible_host=10.44.10.1 ansible_connection=local
es1.lan ansible_host=10.44.10.2 ansible_connection=local
es2.lan ansible_host=10.44.10.3 ansible_connection=local
es3.lan ansible_host=10.44.10.4 ansible_connection=local
# If you have any other sensor or data nodes then you would place them in the list above.


[rock]
sensor.lan

[web]
es1.lan

[sensors:children]
rock

[bro:children]
sensors

[fsf:children]
sensors

[kafka:children]
sensors

[stenographer:children]
sensors

[suricata:children]
sensors

[zookeeper]
sensor.lan

[elasticsearch:children]
es_masters
es_data
es_ingest

[es_masters]
es[1:3].lan

[es_data]
es[1:3].lan

[es_ingest]
es[1:3].lan

[elasticsearch:vars]
# Disable all node roles by default
node_master=false
node_data=false
node_ingest=false

[es_masters:vars]
node_master=true

[es_data:vars]
node_data=true

[es_ingest:vars]
node_ingest=true

[docket:children]
web

[kibana:children]
web

[logstash:children]
sensors

Most of the Rock configuration is now automated and can be called from anywhere on the os. Below are the options. Run sudo rock ssh-config to setup all the hosts prior to deploying.

[admin@sensor ~]$ sudo rock help
Usage: /sbin/rock COMMAND [options]
Commands:
setup               Launch TUI to configure this host for deployment
tui                 Alias for setup
ssh-config          Configure hosts in inventory to use key-based auth (multinode)
deploy              Deploy selected ROCK components
deploy-offline      Same as deploy --offline (Default ISO behavior)
deploy-online       Same as deploy --online
stop                Stop all ROCK services
start               Start all ROCK services
restart             Restart all ROCK services
status              Report status for all ROCK services
genconfig           Generate default configuration based on current system
destroy             Destroy all ROCK data: indexes, logs, PCAP, i.e. EVERYTHING
                      NOTE: Will not remove any services, just the data

Options:
--config, -c <config_yaml>         Specify full path to configuration overrides
--extra, -e <ansible variables>    Set additional variables as key=value or YAML/JSON passed to ansible-playbook
--help, -h                         Show this usage information
--inventory, -i <inventory_path>   Specify path to Ansible inventory file
--limit <host>                     Specify host to run plays
--list-hosts                       Outputs a list of matching hosts; does not execute anything else
--list-tags                        List all available tags
--list-tasks                       List all tasks that would be executed
--offline, -o                      Deploy ROCK using only local repos (Default ISO behavior)
--online, -O                       Deploy ROCK using online repos
--playbook, -p <playbook_path>     Specify path to Ansible playbook file
--skip-tags <tags>                 Only run plays and tasks whose tags do not match these values
--tags, -t <tags>                  Only run plays and tasks tagged with these values
--verbose, -v                      Increase verbosity of ansible-playbook
  1. Setup your ssh access to you machine using sudo rock ssh-config command or using sudo rock tui for the Text user interfaces

  2. Start the interactive text interface for setup using sudo rock tui

  3. Select “Select Interfaces”. This allows you to choose which interface that you will manage and capture with.

  4. Choose you management interface

  5. Choose you capture interface(s).

NOTE: Any interface you set for cature will spawn a Bro/Zeek, Surcata, and FSF process. So if you don’t intend on using the interface do not set it for capture.

  1. You will then be forwarded to the interface summary screen. make sure all the things are to your satisfaction

  2. Once it has returned to the installation setup screen then choose the “Offline/Online” installation option. This tells the installation playbook where to pull the packages. As these kits are meant to be offline we will choose the offline installation option.

  3. Choose “No” for the offline installation.

  4. Once it has returned to the installation setup screen then choose the “Choose Components” installation option.

  5. Here is where you decide what capabilities your sensor will have. If you are low on resources the the recommendation is to disable docket and stenographer. Otherwise just enable everything.

  6. Once it has returned to the installation setup screen then choose the “Choose enabled services” installation option. This needs to match the installed components unless you have a specific reason to do so.

  7. This will write the config to the ansible deployment script.

  8. Once it has returned to the installation setup screen then choose the “Run Installer” installation option.

It should complete with no errors

  1. Ensure the following ports on the firewall are open for the data nodes
  • 9300 TCP - Node coordination (I am sure elastic has abetter name for this)
  • 9200 TCP - Elasticsearch
  • 5601 TCP - Only on the Elasticsearch node that has Kibana installed, Likely es1.lan
  • 22 TCP - SSH Access
sudo firewall-cmd --add-port=9300/tcp --permanent
  1. Reload the firewall config
sudo firewall-cmd --reload
  1. Ensure the following ports on the firewall are open for the sensor
  • 1234 tcp/udp - NTP
  • 22 TCP - SSH Access
  • 9092 TCP - Kafka
sudo firewall-cmd --add-port=22/tcp --permanent
  1. Reload the firewall config
sudo firewall-cmd --reload
  1. Check the Suricata threads per interface. This is so Suricata doesn’t compete with bro for cpu threads in etc/suricata/rock-overrides.yml
%YAML 1.1
---
default-rule-path: "/var/lib/suricata/rules"
rule-files:
  - suricata.rules

af-packet:
  - interface: em4
    threads: 4   <--------
    cluster-id: 99
    cluster-type: cluster_flow
    defrag: yes
    use-mmap: yes
    mmap-locked: yes
    #rollover: yes
    tpacket-v3: yes
    use-emergency-flush: yes
  1. Restart services with sudo rock stop and the sudo rock start

Ill keep working on that other question…

See: Packet Capture Replay (no Tap) topic…

Hi koelslaw,

Thank you for the guide. I had followed it on a fresh installation and choosing to do online deployment instead to get the latest updates.

However, I’m still stuck at the following item and deployment did not complete.

TASK [elasticsearch : Disable cluster shard allocation] ********************************************************
fatal: [rock01]: FAILED! => {"msg": "The conditional check 'result.json.acknowledged | bool' failed. The error was: error while evaluating conditional (result.json.acknowledged | bool): 'dict object' has no attribute 'json'"}

Is this something that can be bypassed?

Thank you.

Hi koelslaw,

Finally figured it out! The item can be bypassed by commenting out the sections related to shard allocation at the start and end of /usr/share/rock/roles/elasticsearch/tasks/restart.yml

Cheers!

Hi,

I followed the host.ini guide at https://docs.rocknsm.io/deploy/multi-node/ and the deployment was successful. However, why is
curl -X GET localhost:9200/_cluster/health?pretty showing

"status":"green",
"number_of_nodes":1,
"number_of_data_nodes":1,

Shouldn’t the number of nodes and data nodes be 3 because I have rock01, rock 02 and rock03 configured in hosts.ini?

Another issue is why would the hostname change after reboot? IP addresses are fixed across reboots but hostname would become rock03, rock02, rock03 for the 3 machines.

Thank you.

What does your etc/rocknsm/hosts.ini file look like?

Here’s my /etc/rocknsm/hosts.ini

[rock]
rock01 ansible_host=192.168.1.1 ansible_connection=local
rock02 ansible_host=192.168.1.2 ansible_connection=local
rock03 ansible_host=192.168.1.3 ansible_connection=local

[web]
rock01 ansible_host=192.168.1.1 ansible_connection=local

[lighttpd:children]
web

[sensors:children]
rock

[bro:children]
sensors

[fsf:children]
sensors

[kafka:children]
sensors

[stenographer:children]
sensors

[suricata:children]
sensors

[filebeat:children]
fsf
suricata

[zookeeper]
rock01 ansible_host=192.168.1.1 ansible_connection=local

[elasticsearch:children]
es_masters
es_data
es_ingest

[es_masters]
# This group should only ever contain exactly 1 or 3 nodes!
#simplerockbuild.simplerock.lan ansible_host=127.0.0.1 ansible_connection=local
# Multi-node example #getting-started 
#elasticsearch0[1:3].simplerock.lan
rock01 ansible_host=192.168.1.1 ansible_connection=local
rock02 ansible_host=192.168.1.2 ansible_connection=local
rock03 ansible_host=192.168.1.3 ansible_connection=local

[es_data]
#simplerockbuild.simplerock.lan ansible_host=127.0.0.1 ansible_connection=local
# Multi-node example #
#elasticsearch0[1:4].simplerock.lan
rock01 ansible_host=192.168.1.1 ansible_connection=local
rock02 ansible_host=192.168.1.2 ansible_connection=local
rock03 ansible_host=192.168.1.3 ansible_connection=local

[es_ingest]
#simplerockbuild.simplerock.lan ansible_host=127.0.0.1 ansible_connection=local
# Multi-node example #
#elasticsearch0[1:4].simplerock.lan
rock01 ansible_host=192.168.1.1 ansible_connection=local
rock02 ansible_host=192.168.1.2 ansible_connection=local
rock03 ansible_host=192.168.1.3 ansible_connection=local

[elasticsearch:vars]
# Disable all node roles by default
node_master=false
node_data=false
node_ingest=false

[es_masters:vars]
node_master=true

[es_data:vars]
node_data=true

[es_ingest:vars]
node_ingest=true

[docket:children]
web

[kibana:children]
web

[logstash:children]
sensors

Couple of things I want to look at here:

  1. I have a few things here but I want to make sure I get the whole picture: What are the resources for your 3 machines? RAM, Cores, HDD. The way your hosts.ini file is setup rock01 will be working its tail off when we could distribute the load a bit more efficiently.

  2. As far as the cluster not talking though:

    • let take a look at you firewall with sudo firewall-cmd --list-all-zones maybe the firewall is blocking comms
    • /etc/hosts in the past there has been a bug where it sets the same IP address for rock1-3, which doesn’t really work.

Let me know what you come up with!

  1. Each machine is a VM with 16GB ram, 8 cores and 1TB HDD. Yes, the hosts.ini is based on the sample from the docs. Perhaps the sample can also be updated?

  2. The playbook would have enabled the necessary firewall rules? Running the command gives the following output:

[root@rock01 admin]# firewall-cmd --list-all-zones
block
target: %%REJECT%%
icmp-block-inversion: no
interfaces:
sources:
services:
ports:
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:

dmz
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: ssh
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


drop
  target: DROP
  icmp-block-inversion: no
  interfaces:
  sources:
  services:
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


external
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: ssh
  ports:
  protocols:
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


home
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: dhcpv6-client mdns samba-client ssh
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


internal
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: dhcpv6-client mdns samba-client ssh
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens33 ens34
  sources:
  services: dhcpv6-client ssh
  ports: 443/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


trusted
  target: ACCEPT
  icmp-block-inversion: no
  interfaces:
  sources:
  services:
  ports:
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:


work (active)
  target: default
  icmp-block-inversion: no
  interfaces:
  sources: 0.0.0.0/0 192.168.1.1
  services: dhcpv6-client ssh
  ports: 22/tcp 9200/tcp 9300/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
  1. Yes, the /etc/hosts have to be edited everytime. likewise for /etc/elasticsearch/elasticsearch.yml to everytime add in the discovery.zen.ping.unicast.hosts: [“rock01”,“rock02”,“rock03”]

Anything else to check?

Thank you.

ok the firewall looks good. The elasticsearch config may need to look like this depending what version of elasticsearch you are using.

if that doesn’t help then we may need to look in etc//var/log/elasticsearch/<somelogname> to see if we can get any more info from there.

As far as the machines go if you are looking to do a multi-node deployment you will probably benefit from 4 virtual machines (1 holding afpacket to kafka and then the another ones holding the 3 elasticsearch nodes). The config file would look something like what I have below. This would better distribute the work across everything. All in all doing a multi-node deployment has a lot of considerations that may be specific to the environment you are trying to capture like bandwidth and your retention requirements.

[rock]
rock01sensor ansible_host=192.168.1.1 ansible_connection=local
rock02 ansible_host=192.168.1.2 ansible_connection=local
rock03 ansible_host=192.168.1.3 ansible_connection=local
rock04 ansible_host=192.168.1.4 ansible_connection=local


[web]
rock02

[lighttpd:children]
web

[sensors:children]
rock

[bro:children]
sensors

[fsf:children]
sensors

[kafka:children]
sensors

[stenographer:children]
sensors

[suricata:children]
sensors

[filebeat:children]
fsf
suricata

[zookeeper]
rock01sensor

[elasticsearch:children]
es_masters
es_data
es_ingest

[es_masters]
rock0[2:4]

[es_data]
# Multi-node example #
rock0[2:4]

[es_ingest]
rock0[2:4]

[elasticsearch:vars]
# Disable all node roles by default
node_master=false
node_data=false
node_ingest=false

[es_masters:vars]
node_master=true

[es_data:vars]
node_data=true

[es_ingest:vars]
node_ingest=true

[docket:children]
web