Distributed Firewall (DFW): Network security at the host level at LinkedIn
October 21, 2021
Co-authors: Matthew Davidson, Walter Marchuk, Andreas Zaugg, Michael Garate, and William Buenzle
In traditional network design, hardware firewalls are used to provide isolation of network segments. As a network increases in size and complexity, an increasing number of firewall rules are required to provide necessary access, but the number of rules a hardware firewall can support is limited by memory capacity.
LinkedIn currently maintains a private cloud composed of hundreds of thousands of hosts across multiple data centers linked together as one network. In order to provide the necessary connectivity between services while maintaining security policy, LinkedIn uses firewall rules that greatly exceed the memory capacity of traditional hardware firewalls.
In order to implement the extensive ruleset required to operate a professional network, LinkedIn needed a scalable solution that could overcome this limitation. Distributed Firewall (DFW) is our host-level firewalling system used to maintain connectivity and restrictions at scale.
This post provides some details of the design and operation of DFW and how it allows LinkedIn to scale its network and services while minimizing manual intervention and maintenance.
Components of the Distributed Firewall system
DFW consists of two parts: an agent that runs on each host, and the DFW server running in each environment (e.g., a portion of a data center network). The agents maintain a constant connection with their server in order to receive rule update notifications and pass back logs and metrics.
The LinkedIn environment is constantly changing, with thousands of application deployments per day, and hundreds of changes to access control listings (ACLs) per week. Application deployments are tracked by LinkedIn Platform as a Service (LPS) and ACL updates are managed with an internal ACL server.
As deployment events occur, LPS produces Kafka messages that are consumed by the DFW server. These messages provide parameters that are used to further query LPS to determine if certain frequently changing deployment details need to be updated. The ACL server is also polled frequently by each server.
As these deployment and ACL data sources are updated, new rules are calculated as necessary. If the changes affect only one host, a message is sent notifying the host's agent to retrieve new rules. If the changes affect many hosts, the rules are broadcast to all hosts.
When an agent receives a host update notification, it retrieves its ruleset from the DFW server. This ruleset is parsed into iptables and ipset commands and applied on the host. If a broadcast update is received, the ruleset contained in the message is applied without a second connection to the DFW server.
All logs generated by the agent in this process are forwarded back to the DFW server using the standing connection. With the rules applied, the agents also compile packet logs. These are bundled and sent to the DFW server, which forwards them to Kafka for security audit purposes.
One guiding principle of the design of DFW was to limit the amount of logic that is deployed at the host level. Limiting the complexity of the code at the host level allowed us to limit the number of deployments targeting the thousands of hosts at LinkedIn. Code updates on this scale have inherent risk, especially with edge use-cases and special environments. Additionally, using minimal logic at the host level limits resources consumed by the process, allowing the host to be used for deployed services.
With these goals in mind, rule compilation and calculation were kept within the DFW server and the agent was left to parse the ruleset received from the server and apply it with iptables and ipset commands. The agent is written in Python and distributed as a 2 MB package containing all the libraries required to operate. As the agent launches, it consults a configuration file maintained by configuration management to find a suitable server. It then contacts the server to retrieve its ruleset: a zone table (a translation table used to determine the location and usage environment of the host), as well as the host and zone rules.
After the agent retrieves the rules and parses them into iptables and ipset commands, these rules are compared with those currently loaded in the kernel. If differences are found, the new rules are applied and an audit log is generated.
The agents maintain a constant connection to the DFW server using the WebSocket protocol, and all audit logs are passed back to the server over this channel. The agents also pass heartbeat messages back to the server at regular intervals. These heartbeats contain hash values of the current ruleset. These heartbeats confirm to the server that the agent is still connected and that rules match those on the server.
As rules change, the DFW server sends messages over the established WebSocket connection. All messages are received by all connected hosts, so the message length is kept to a minimum, except when the message is intended for a large number of hosts, as in broadcast updates. The agents filter messages destined for other hosts, but if a notification intended for the listening host is received, rules are then retrieved using a separate HTTPS connection to the DFW server.
Highly restrictive rules are useful for protecting sensitive data and applications. The agent operates a restricted access table that limits access to specified ports to only specified hosts. Only hosts specified in the correct ipset are allowed access to restricted ports.
The agent also manages connection tracking settings, since many applications need connection tracking enabled or disabled according to specifications.
To implement chaos engineering in support of the Waterbear project, rulesets designed to reproduce outage conditions are included as special modes of operation, which can be triggered by special messages sent from the DFW server. These rules have also been used to assist in the decommissioning of hosts, allowing a host to be reactivated quickly if decommissioning is aborted.
The agent runs nflog to capture packet logs on disk and to a socket that is monitored by the agent. The agent forwards these packet logs back to the server.
A command line tool is included with the agent package, allowing users to send commands to the agent by socket. Two sockets exist for privileged and non-privileged commands. All users can query the status of DFW and view recent packet logs. Privileged users can trigger restarts and make limited, temporary changes to rules for debugging purposes. These temporary changes are cached on disk to be applied for a limited time.
Rule generation and distribution server
The DFW server rule generation database maintains all the essential business logic related to rule generation for all hosts. In addition to providing rule generation for all special use cases, the server interacts with upstream data services, publishes metrics and security logs, and maintains long-lived connections to the agents. The server is written in Python, and nginx and Redis are used to provide improved performance.
One of the most advanced features of the server is the ability to maintain long-lived connections to thousands of hosts. These connections allow the server to send rules notices to clients and retrieve status information and logs from the clients.
Each WebSocket connection is bidirectional, allowing the clients to send back logs on the same channel the server uses to send notices.
Since messages from the clients sometimes arrive in bursts, such as after a large rule update for multiple hosts, Redis is used to provide a message buffer between nginx and the Python code processing the logs. As client messages are processed, they are logged to disk and in some cases forwarded to Kafka to aid in security auditing.
Rules updates occur at regular intervals. A scheduler starts a separate rules generator process, which syncs with upstream data sources that store deployment information (stored as address groups) and ACL's describing which host groups are allowed or denied connections.
Upstream data from LinkedIn’s deployment database is also used to track the hosts with applications needing connection-tracking settings applied. Flags are added to the rules data for all hosts with these applications deployed.
After the upstream data is compiled, the updated ACLs, address groups, and flags are written to JSON files on disk in parallel. For host rules, individual hosts retrieve their rules through an nginx endpoint. Broadcast rules are sent to all hosts through the open WebSocket channels, to avoid having many hosts connecting to retrieve them at once.
This polled and batch processed method is suitable for most use cases, but in some cases, rule updates are needed faster. For example, as a deployment occurs, healthchecker services will attempt to connect to the newly deployed service. If these healthcheckers aren't able to connect within a time limit, then the deployment may be marked as failed. For this use case, a parallel faster update pathway was created using Kafka. As each deployment occurs, a Kafka message is produced by the deployment system and consumed by the DFW server. This Kafka message contains query parameters to be used to craft a limited query to the deployment system to determine the host on which the new deployment is occurring. After getting this host information, the server sends notices to open ports to healthcheckers for the necessary hosts. A rules notice is sent to these hosts, and rules are updated generally in fewer than 15 seconds after a deployment event.
The server features a command line tool to be used locally to query client information and message statistics and to trigger rule updates. Metrics are published to Autometrics for external viewing and alerting, as well. Some metrics and logs are further processed with Samza.
With the ongoing project to migrate a portion of LinkedIn's services to the Azure cloud, the DFW server has also been used to provide rule coverage for Azure hosts to supplement the rules capability provided by network security groups (NSGs). Since the NSGs have capacity limits, some detailed rulesets limiting access from private cloud hosts to Azure hosts couldn't be implemented. For these rules, the server and agent provide high resolution limitations on certain services to complement the NSG ruleset.
Several new feature requests are currently being planned and developed for DFW. These include managing the forward table to provide automatic routing through proxies, providing additional rule types in Azure for granular control, and increasing the speed of rule propagation.
In addition to serving as LinkedIn's primary firewalling system, DFW has proven useful for many other business purposes, such as chaos engineering, service mapping, outage recovery, and host reclaim. The flexibility of an on-host firewalling system provides many opportunities to customize hosts' network behavior. While some CPU resources are used to provide packet filtering, the system provides more extensive rules and extensible functionality than possible with other large-scale firewalling solutions.
Many people at LinkedIn have worked to develop and deploy the various versions of DFW in the past. Firstly, we must acknowledge Mike Svoboda who implemented the first iteration of DFW entirely using CFEngine. Additionally, the Information Security team was helpful in providing motivation and guidance. Also, the SysOps team made it possible to confirm that DFW coverage was complete during deployment. The NetOps teams also spent countless hours arranging and verifying ACLs that allow services to connect. Lastly, of course we must acknowledge the developers who accepted the challenge of building and maintaining a system so crucial to security and site-up.