Safely Lockdown NSX Distributed Firewall (DFW) Ruleset
A common dilemma when developing a solution with firewall is whether to change the Default rule to Deny at the start and develop the ruleset as part of development or leave the Default rule to Allow and secure it later. In modern agile teams its best to develop the ruleset as part of development ensuring the ruleset is tested with the product as introducing it later could well invalidate every bit of testing performed.
If however you find yourself in the situation where a NSX firewall solution is deployed with the Default rule to Allow and your asked to implement a ruleset to cover the traffic and change default to Deny. This is one possible solution to capture the required configuration.
Enable Default Rule Logging
In order we can capture the active traffic we can first enable Logging on the default rule.
We can then operate the environment normally for a period of time which captures all business processes. This maybe a day, a week, a month or more.
NSX data plane logging is written to the VMkernel.log files, therefore if a logical firewall rule log is generated for a vNIC of a VM it is written to the ESX host log file which it was residing at that time.
The distributed firewall configuration can apply to the whole vCenter and all objects within. You must therefore configure the remote syslog server for each host in each cluster that has firewall enabled. The remote syslog server is specified in the Syslog.global.logHost attribute. My preference is to use vRealize Log Insight for centralized syslog.
Identifying Traffic Hitting Default Allow
When we browse the firewall log entries we find that in order to narrow our search to the correct rule we need to establish the vmw_nsx_firewall_ruleid for the default layer 3 rule which we are logging. The vmw_nsx_firewall_ruleid is not displayed via the NSX Firewall GUI but can be easily got from the NSX REST API by running the GET method on this URL
We see in this example the rule id=“1001”
Once we have this we can create a vRealize Log Insight query to list all logs generated by this rule. To make this query easier to view I remove all columns except timestamp and vmw_nsx_firewall.
From this data we can then identify what traffic would be blocked if we changed the default rule to Deny. We can work through this data, identify valid traffic flows. We can then put in explicit allow rules for the valid traffic. When we are happy no traffic is being logged by the Default Rule we can change it to Deny.