* add calculated_remotes
This setting allows us to "guess" what the remote might be for a host
while we wait for the lighthouse response. For networks that hard
designed with in mind, it can help speed up handshake performance, as well as
improve resiliency in the case that all lighthouses are down.
Example:
lighthouse:
# ...
calculated_remotes:
# For any Nebula IPs in 10.0.10.0/24, this will apply the mask and add
# the calculated IP as an initial remote (while we wait for the response
# from the lighthouse). Both CIDRs must have the same mask size.
# For example, Nebula IP 10.0.10.123 will have a calculated remote of
# 192.168.1.123
10.0.10.0/24:
- mask: 192.168.1.0/24
port: 4242
* figure out what is up with this test
* add test
* better logic for sending handshakes
Keep track of the last light of hosts we sent handshakes to. Only log
handshake sent messages if the list has changed.
Remove the test Test_NewHandshakeManagerTrigger because it is faulty and
makes no sense. It relys on the fact that no handshake packets actually
get sent, but with these changes we would send packets now (which it
should!)
* use atomic.Pointer
* cleanup to make it clearer
* fix typo in example
* firewall: add option to send REJECT replies
This change allows you to configure the firewall to send REJECT packets
when a packet is denied.
firewall:
# Action to take when a packet is not allowed by the firewall rules.
# Can be one of:
# `drop` (default): silently drop the packet.
# `reject`: send a reject reply.
# - For TCP, this will be a RST "Connection Reset" packet.
# - For other protocols, this will be an ICMP port unreachable packet.
outbound_action: drop
inbound_action: drop
These packets are only sent to established tunnels, and only on the
overlay network (currently IPv4 only).
$ ping -c1 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
From 192.168.100.3 icmp_seq=2 Destination Port Unreachable
--- 192.168.100.3 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 31ms
$ nc -nzv 192.168.100.3 22
(UNKNOWN) [192.168.100.3] 22 (?) : Connection refused
This change also modifies the smoke test to capture tcpdump pcaps from
both the inside and outside to inspect what is going on over the wire.
It also now does TCP and UDP packet tests using the Nmap version of
ncat.
* calculate seq and ack the same was as the kernel
The logic a bit confusing, so we copy it straight from how the kernel
does iptables `--reject-with tcp-reset`:
- https://github.com/torvalds/linux/blob/v5.19/net/ipv4/netfilter/nf_reject_ipv4.c#L193-L221
* cleanup
These new helpers make the code a lot cleaner. I confirmed that the
simple helpers like `atomic.Int64` don't add any extra overhead as they
get inlined by the compiler. `atomic.Pointer` adds an extra method call
as it no longer gets inlined, but we aren't using these on the hot path
so it is probably okay.
We have a few small race conditions with creating the HostInfo.ConnectionState
since we add the host info to the pendingHostMap before we set this
field. We can make everything a lot easier if we just add an "init"
function so that we can set this field in the hostinfo before we add it
to the hostmap.
* Add more metrics
This change adds the following counter metrics:
Metrics to track packets dropped at the firewall:
firewall.dropped.local_ip
firewall.dropped.remote_ip
firewall.dropped.no_rule
Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).
handshake_manager.initiated
handshake_manager.timed_out
Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.
hostinfo.cached_packets.dropped
hostinfo.cached_packets.sent
This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.
* separate incoming/outgoing metrics
* remove "allowed" firewall metrics
We don't need this on the hotpath, they aren't worh it.
* don't need pointers here
We missed this race with #396 (and I think this is also the crash in
issue #226). We need to lock a little higher in the getOrHandshake
method, before we reset hostinfo.ConnectionInfo. Previously, two
routines could enter this section and confuse the handshake process.
This could result in the other side sending a recv_error that also has
a race with setting hostinfo.ConnectionInfo back to nil. So we make sure
to grab the lock in handleRecvError as well.
Neither of these code paths are in the hot path (handling packets
between two hosts over an active tunnel) so there should be no
performance concerns.
This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.
Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.
Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).
The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.
here is the results of `make smoke-docker-race`:
before:
lighthouse1: Found 2 data race(s)
host2: Found 36 data race(s)
host3: Found 17 data race(s)
host4: Found 31 data race(s)
after:
host2: Found 1 data race(s)
host4: Found 1 data race(s)
Fixes: #147Fixes: #226Fixes: #283Fixes: #316
Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table.
Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.
This change is for Linux only.
Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues.
To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction).
Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged.
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Currently, if a packet arrives on the tun device with a destination that
is not a routable Nebula IP, `queryUnsafeRoute` converts that IP to
0.0.0.0 and we store that packet and try to look up that IP with the
lighthouse. This doesn't make any sense to do, if we get a packet that
is unroutable we should just drop it.
Note, we have a few configurable options like `drop_local_broadcast`
and `drop_multicast` which do this for a few specific types, but since
no packets like this will send correctly I think we should just drop
anything that is unroutable.
We are currently triggering a fast handshake for static hosts right
inside HandshakeManager.AddVpnIP, but this can actually trigger before
we have generated the handshake packet to use. Instead, we should be
triggering right after we call ixHandshakeStage0 in getOrHandshake
(which generates the handshake packet)
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.
We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
This change adds a new helper, `(*HostInfo).logger()`, that starts a new
logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside
of handshake_ix though since the certificate has not been attached to
the HostInfo yet.
Fixes: #84