For the complete documentation index, see llms.txt. This page is also available as Markdown.

Linux Kernel WireGuard tuning

This technical guide addresses WireGuard VPN latency, throughput, resilience, and scalability requirements by tuning Linux kernel parameters (as WireGuard is part of the Linux kernel).

Introduction

WireGuard is widely praised for its lean codebase and efficiency. However, the default Linux kernel settings are often tuned for general-purpose computing, not for acting as a high-speed router handling encrypted UDP traffic at scale.

To achieve maximum performance (low latency), stability across changing networks (roaming), and high concurrency, you should tune three distinct layers.

Kernel sysctl settings optimize how the Linux kernel schedules packets and manages memory buffers. Add the following to /etc/sysctl.d/99-wireguard-tuning.conf or /etc/sysctl.conf.

Hardware requirements

Before Linux Kernel tuning, please not that the hardware itself needs to be efficient in order to handle the bandwidth and user load.

Here are some general tips on hardware.

CPU Sizing Methodology for Defguard Gateways

Since WireGuard performs cryptographic operations (ChaCha20-Poly1305) directly inside the Linux kernel, CPU utilization scales primarily with Aggregate Peak Throughput and Packets Per Second (PPS), rather than the raw number of idle connected peers.

Core Formula:

Required Cores (vCPUs) = Ceil( Expected Peak Throughput (Gbps) / Core Crypto Capacity (Gbps) ) + Dedicated Network Cores

Where:

  1. Core Crypto Capacity: A modern server CPU core (with AVX2 or AVX-512 extensions) can process roughly 1.5 to 2.0 Gbps of encrypted WireGuard traffic (assuming an average mixed packet size of ~1000 bytes).

  2. Expected Peak Throughput: Calculated based on the concurrency model: (Concurrent Active Users * Average Peak Bandwidth per User).

  3. Dedicated Network Cores (Overhead): Cores heavily consumed by SoftIRQs, NAPI polling (driven by netdev_budget), and netfilter/conntrack processing.

    • For < 100 devices: 0 extra cores needed (handled by the base 2 cores).

    • For 1,000 devices: +2 cores should be allocated/optimized for network interrupt handling.

    • For 10,000 devices: +4 to +8 cores must be dedicated to RPS (Receive Packet Steering) to distribute the massive PPS load across the CPU complex.

RAM Memory

To estimate the system memory required by the kernel network stack and gateway components, use the following formula:

RAMtotal=RAMbase+(Nactive×0.003)RAM_{total} = RAM_{base} + (N_{active} \times 0.003)

Where:

  • RAM_{total} is the total recommended system memory in Gigabytes (GB).

  • RAM_{base} is 1.0 GB (the baseline allocation for a minimal Linux OS, the Defguard core/gateway agent, and basic userspace management processes).

  • N_{active} is the number of concurrent, actively transmitting devices.

  • 0.003 (3 MB) is the memory overhead multiplier per active user under maximum burst conditions.

💡 Why 3 MB per user?

This multiplier accounts for the optimized kernel configurations recommended in the tuning guide:

  1. Conntrack Entries: An average active user spawns ~100 stateful connections. At ~320 bytes per nf_conntrack entry, plus hashing overhead, this consumes significant kernel memory.

  2. UDP Buffers: High-throughput tuning relies on massive network windows (net.core.rmem_max = 16MB or higher). When multiple users blast traffic simultaneously, the kernel allocates large socket buffers (sk_buff) to prevent packet drops during CPU scheduling delays.

Kernel tuning

Congestion Control & Queuing (Latency & Throughput)

To reduce bufferbloat (latency spikes under load) and maximize throughput, we replace the default CUBIC algorithm with BBR (Bottleneck Bandwidth and Round-trip propagation time), which is less sensitive to packet loss and more aggressively seeks the optimal congestion window.

Memory & Buffers (Throughput)

WireGuard uses UDP for data transport. By default, Linux kernel UDP buffer sizes are often too small for high-speed transfers (1 Gbps+), causing packets to be dropped in the kernel before WireGuard can process them.

Packet Processing & Forwarding (Efficiency)

These settings allow the kernel to process packets faster and handle bursts of traffic without dropping them.

Packet buffering

In Linux, network cards (NICs) use NAPI (New API) polling to handle incoming packets. When an interrupt fires, the kernel disables further interrupts and polls the NIC, processing packets in batches.

net.core.netdev_budget limits how many packets the kernel may process in a single SoftIRQ cycle before yielding the CPU, and its default value is 300 packets.

  • Too low a value - Under heavy load (e.g., 100+ streaming users), the kernel yields too early, packets back up in the NIC buffer, and drops occur.

  • Too high a value - The networking stack can monopolize a CPU core, starving userspace processes and increasing overall latency.

For high-performance VPN servers, we increase netdev_budget to favor network throughput and tune the companion setting netdev_budget_usecs to cap CPU time per polling cycle. Below you will find some recommended values for multiple scenarios.

Home/Small Office

Meaning around ~20 users: the default value of 300 is fine, and changing it will likely not be noticeable.

50 VPN users and above

High throughput ≥ 10Gbps

You may need values as high as netdev_budget = 1200, assuming you have a powerful CPU with Receive Packet Steering (RPS) enabled.

Multiple connection concurrency (egress via VPN)

WireGuard is stateless, but Linux connection tracking in the firewall, used for masquerading or DNAT when configuring egress through the VPN, is stateful.

Here is a way to optimize netfilter parameters based on the following assumption: one connected device generates one UDP stream for the VPN and multiple TCP streams used by the user or device for browsing and applications exiting through the VPN.

Parameter (Sysctl)
Description
10 Devices(Home/SOHO)
100 Devices(SMB/Office)
1,000 Devices(Enterprise/ISP)
10,000 Devices(Data Center)

net.netfilter.nf_conntrack_max

CRITICAL. Max concurrent connections tracked.

65536

131072

524288

5242880

net.core.somaxconn

Max pending connections in queue.

4096

4096

16384

65535

net.core.netdev_max_backlog

Max packets queued if kernel is busy.

1000

5000

16384

65535

net.core.netdev_budget

Max packets processed in one CPU cycle.

300

600

600

1200

net.core.rmem_max (Bytes)

Max OS receive buffer size (UDP).

16 MB

16 MB

32 MB

128 MB

net.core.wmem_max (Bytes)

Max OS send buffer size (UDP).

16 MB

16 MB

32 MB

128 MB

fs.file-max

System-wide file descriptor limit.

Default

100000

1000000

5000000

Required System RAM

Minimum RAM needed for state tables.

512 MB

1 GB

4 GB

32 GB+

Last updated

Was this helpful?