2.0

High availability

Requirements

  1. Users should be able to run multiple instances of all components.

    1. Core

    2. Edge

    3. Gateway

  2. In case any component fails, other instances should take over the work.

  3. Recovery should be as fast as possible in terms of dropped requests or jobs. In a perfect scenario, we do not drop any requests or jobs, even if the components processing them fail mid-processing.

High-level overview of Defguard workflows

Considered options

1. Active-active

  • Every core connects to every Edge and Gateway (full mesh at the app layer).

  • Any core can initiate actions and handle requests from Edges/Gateways.

  • A DB-based queue for async jobs, such as MFA disconnects.

Pros

  • No need for failure detection and leader election.

  • "True HA" - nothing special has to happen when one of the components dies, because the connections already exist.

  • Scales Core CPU if the control plane is truly parallelizable.

  • Avoids the Core load-balancing issue of selecting the active Core when handling UI requests.

Cons

  • Requires a job queue implementation to avoid duplicating scheduled work, e.g. MFA disconnects.

  • Requires additional routing logic for Edge and Gateway components so they do not send the same request to multiple Cores.

  • Requires a truly stateless Core.

2. Active-Failover

  • Run N cores.

  • Exactly one leader stores the "lease row" in the database, then establishes gRPC connections to all Gateways and Edges and performs all write and control actions.

  • The leader periodically updates the heartbeat on the lease row.

  • Standby Cores monitor the heartbeat. If the heartbeat is not renewed, they race to acquire the lease.

  • Once a Core successfully stores the lease row, it establishes gRPC connections and acts as the new leader.

Pros

  • Cleanest correctness story (no split-brain control plane).

  • Simplifies gRPC connection topology: each Edge/Gateway sees one controlling core.

Cons

  • This is a "failover" rather than a true "HA" solution.

  • Requires robust failure detection, leader election (DB lease / k8s lease), and careful failover to avoid two leaders during partitions.

  • Does not scale control-plane (Core) throughput horizontally; it only gives us failover, which may be all we need for now.

  • How do we route HTTP requests only to the active Core? Failover instances can reply with error codes to health checks, and then the load balancer routes requests only to the leader.

  • What if only the Core-Edge connection fails, but Core itself still works?

3. Vertically connected components

Each core connects to one Edge and one Gateway.

Deal-breaking issue: For mobile-assisted MFA, Core has to be able to route responses to all Edges, not just the one from which the request originates. This is not possible with this approach.

4. Decoupling the components via external queue

  • Introduce an external message bus or queue into the stack, such as Redis or RabbitMQ.

  • Refactor the components to use the queue instead of gRPC.

Deal-breaking issues:

  • A major rewrite of all communication.

  • Increased deployment complexity.

Decision

Option 1, the active-active approach, is selected.

Rationale

The active-active approach provides true high availability rather than availability through failover. With multiple core instances operating concurrently, the system is self-correcting and can continue to function during partial failures without waiting for explicit leader detection or role transitions.

Although active-active operation introduces the need to coordinate scheduled and background tasks, this coordination is more constrained and predictable than the failure detection, leader election, and fencing mechanisms required by an active-passive design.

Access Control List changes

Separating Alias kinds

In Defguard 2.0, the new UI clearly separates the previously existing alias kinds into two distinct sections:

  • Component aliases are now just Aliases.

  • Destination aliases are now Destinations.

When creating or editing rules, there is now also a clear distinction in the UI between Aliases, which are combined with the manually configured destination, and predefined Destinations.

This better reflects the different roles of both types of aliases:

  • Aliases are reusable fragments used to configure a rule-local destination.

  • Destinations are complete predefined destinations, each converted into a separate set of firewall rules, just like the manually configured destination.

This distinction already existed in practice in previous versions, but it was not expressed clearly enough in the UI.

Both Aliases and Destinations are still stored in the same underlying database model. The new split is enforced at the UI and API level.

Explicit destination configuration

In previous Defguard versions, the ACL rule logic regarding destinations broadly reflected how most firewalls, such as nftables and packetfilter, work. As a result, some behaviors were implicit.

In particular, rules and aliases could omit destination addresses, ports, or protocols, which implicitly meant "match any". This matched firewall semantics, but it introduced ambiguity in the data model and in the UI:

  • The UI showed a placeholder value ("All addresses/ports/protocols"), but the intent to match all addresses, ports, or protocols was not represented in the data model itself.

  • The logic for generating firewall rules had to assume user intent, especially for rules using aliases.

  • Validation and editing logic had to handle a number of edge cases.

As ACL functionality evolved, this approach became harder to maintain consistently. Defguard 2.0 introduces a more explicit ACL model to reduce ambiguity while preserving the effective firewall behavior of existing rules.

Database model changes

In Defguard 2.0, the ACL database model makes destination semantics explicit instead of inferring "match any" from empty fields.

Both the aliases and rules database tables now include explicit boolean flags for configuring destinations:

  • any_address

  • any_port

  • any_protocol

In addition, rules now include use_manual_destination_settings, which defines how destination configuration should be interpreted:

  • When true, the rule uses its own destination fields together with referenced component aliases.

  • When false, the rule uses only the referenced Destinations.

The rule model also adds allow_all_groups and deny_all_groups to align group handling with the already explicit "all" flags used for other source types.

Overall, the 2.0 schema preserves the effective firewall behavior of existing ACL rules while making the model clearer, easier to validate, and less dependent on implicit assumptions.

Backfill logic

The Defguard 2.0 database migration includes SQL backfill logic that converts legacy implicit ACL behavior into the new explicit model while preserving the meaning of existing rules.

For Destinations (previously destination aliases), the migration backfills the new any_* flags from the legacy fields:

  • any_address is set to true when the alias had no destination addresses and no destination ranges.

  • any_port is set to true when the alias had no ports.

  • any_protocol is set to true when the alias had no protocols.

For Rules, the migration evaluates both rule-local destination settings and linked aliases.

The rule flags are backfilled as follows:

  • any_address is set to true only if the rule had no direct destination addresses or ranges and no linked component alias contributed addresses.

  • any_port is set to true only if the rule had no direct ports and no linked component alias contributed ports.

  • any_protocol is set to true only if the rule had no direct protocols and no linked component alias contributed protocols.

The migration sets use_manual_destination_settings to false only when a legacy rule was effectively driven entirely by destination aliases, meaning:

  • the rule had no direct destination addresses, ranges, ports, or protocols,

  • no linked component alias contributed any destination fragments,

  • at least one linked destination alias existed.

In every other case, use_manual_destination_settings remains true, preserving the previous behavior of rules that relied on direct destination settings or component aliases.

As a result, legacy empty destination fields become explicit "match any" flags, rules based only on destination aliases become explicit Destination-based rules, and mixed or manual rules continue to behave as they did before migration.

Renamed columns

The database migration also renames some columns in ACL-related tables to better reflect their purpose:

  • destination -> addresses

  • all_networks -> all_locations

Last updated

Was this helpful?