nebula

Commit Graph

Author	SHA1	Message	Date
Nate Brown	c1711bc9c5	Remove tcp rtt tracking from the firewall (#1114 )	2024-04-11 21:44:22 -05:00
Nate Brown	072edd56b3	Fix re-entrant `GetOrHandshake` issues (#1044 )	2023-12-19 11:58:31 -06:00
Wade Simmons	fe16ea566d	firewall reject packets: cleanup error cases (#957 )	2023-11-13 12:43:51 -06:00
Nate Brown	a44e1b8b05	Clean up a hostinfo to reduce memory usage (#955 )	2023-11-02 16:53:59 -05:00
Nate Brown	076ebc6c6e	Simplify getting a hostinfo or starting a handshake with one (#954 )	2023-08-21 18:51:45 -05:00
Nate Brown	7edcf620c0	We only need the certificate in ConnectionState (#953 )	2023-08-21 14:11:06 -05:00
Nate Brown	5a131b2975	Combine ca, cert, and key handling (#952 )	2023-08-14 21:32:40 -05:00
Nate Brown	a10baeee92	Pull hostmap and pending hostmap apart, remove unused functions (#843 )	2023-07-24 12:37:52 -05:00
Nate Brown	03e4a7f988	Rehandshaking (#838 ) Co-authored-by: Brad Higgins <brad@defined.net> Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2023-05-04 15:16:37 -05:00
brad-defined	9b03053191	update EncReader and EncWriter interface function args to have concrete types (#844 ) * Update LightHouseHandlerFunc to remove EncWriter param. * Move EncWriter to interface * EncReader, too	2023-04-07 14:28:37 -04:00
Wade Simmons	e0553822b0	Use NewGCMTLS (when using experiment boringcrypto) (#803 ) * Use NewGCMTLS (when using experiment boringcrypto) This change only affects builds built using `GOEXPERIMENT=boringcrypto`. When built with this experiment, we use the NewGCMTLS() method exposed by goboring, which validates that the nonce is strictly monotonically increasing. This is the TLS 1.2 specification for nonce generation (which also matches the method used by the Noise Protocol) - https://github.com/golang/go/blob/go1.19/src/crypto/tls/cipher_suites.go#L520-L522 - https://github.com/golang/go/blob/go1.19/src/crypto/internal/boring/aes.go#L235-L237 - https://github.com/golang/go/blob/go1.19/src/crypto/internal/boring/aes.go#L250 - `ae223d6138/include/openssl/aead.h (L379-L381)` - `ae223d6138/crypto/fipsmodule/cipher/e_aes.c (L1082-L1093)` * need to lock around EncryptDanger in SendVia * fix link to test vector	2023-04-05 11:08:23 -04:00
Nate Brown	1a6c657451	Normalize logs (#837 )	2023-03-30 15:07:31 -05:00
Wade Simmons	e1af37e46d	add calculated_remotes (#759 ) * add calculated_remotes This setting allows us to "guess" what the remote might be for a host while we wait for the lighthouse response. For networks that hard designed with in mind, it can help speed up handshake performance, as well as improve resiliency in the case that all lighthouses are down. Example: lighthouse: # ... calculated_remotes: # For any Nebula IPs in 10.0.10.0/24, this will apply the mask and add # the calculated IP as an initial remote (while we wait for the response # from the lighthouse). Both CIDRs must have the same mask size. # For example, Nebula IP 10.0.10.123 will have a calculated remote of # 192.168.1.123 10.0.10.0/24: - mask: 192.168.1.0/24 port: 4242 * figure out what is up with this test * add test * better logic for sending handshakes Keep track of the last light of hosts we sent handshakes to. Only log handshake sent messages if the list has changed. Remove the test Test_NewHandshakeManagerTrigger because it is faulty and makes no sense. It relys on the fact that no handshake packets actually get sent, but with these changes we would send packets now (which it should!) * use atomic.Pointer * cleanup to make it clearer * fix typo in example	2023-03-13 15:09:08 -04:00
Wade Simmons	6e0ae4f9a3	firewall: add option to send REJECT replies (#738 ) * firewall: add option to send REJECT replies This change allows you to configure the firewall to send REJECT packets when a packet is denied. firewall: # Action to take when a packet is not allowed by the firewall rules. # Can be one of: # `drop` (default): silently drop the packet. # `reject`: send a reject reply. # - For TCP, this will be a RST "Connection Reset" packet. # - For other protocols, this will be an ICMP port unreachable packet. outbound_action: drop inbound_action: drop These packets are only sent to established tunnels, and only on the overlay network (currently IPv4 only). $ ping -c1 192.168.100.3 PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data. From 192.168.100.3 icmp_seq=2 Destination Port Unreachable --- 192.168.100.3 ping statistics --- 2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 31ms $ nc -nzv 192.168.100.3 22 (UNKNOWN) [192.168.100.3] 22 (?) : Connection refused This change also modifies the smoke test to capture tcpdump pcaps from both the inside and outside to inspect what is going on over the wire. It also now does TCP and UDP packet tests using the Nmap version of ncat. * calculate seq and ack the same was as the kernel The logic a bit confusing, so we copy it straight from how the kernel does iptables `--reject-with tcp-reset`: - https://github.com/torvalds/linux/blob/v5.19/net/ipv4/netfilter/nf_reject_ipv4.c#L193-L221 * cleanup	2023-03-13 15:08:40 -04:00
Nate Brown	a06977bbd5	Track connections by local index id instead of vpn ip (#807 )	2023-02-13 14:41:05 -06:00
John Maguire	5bd8712946	Immediately forward packets from self to self on FreeBSD (#808 )	2023-01-23 15:51:54 -06:00
Wade Simmons	9af242dc47	switch to new sync/atomic helpers in go1.19 (#728 ) These new helpers make the code a lot cleaner. I confirmed that the simple helpers like `atomic.Int64` don't add any extra overhead as they get inlined by the compiler. `atomic.Pointer` adds an extra method call as it no longer gets inlined, but we aren't using these on the hot path so it is probably okay.	2022-10-31 13:37:41 -04:00
Nate Brown	4c0ae3df5e	Refuse to process double encrypted packets (#741 )	2022-09-19 12:47:48 -05:00
Nate Brown	feb3e1317f	Add a simple benchmark to e2e tests (#739 )	2022-09-01 09:44:58 -05:00
brad-defined	169cdbbd35	Immediately forward packets received on the nebula TUN device from self to self (#501 ) * Immediately forward packets received on the nebula TUN device with a destination of our Nebula VPN IP right back out that same TUN device on MacOS.	2022-06-27 14:36:10 -04:00
brad-defined	1a7c575011	Relay (#678 ) Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>	2022-06-21 13:35:23 -05:00
Nate Brown	312a01dc09	Lighthouse reload support (#649 ) Co-authored-by: John Maguire <contact@johnmaguire.me>	2022-03-14 12:35:13 -05:00
Nate Brown	467e605d5e	Push route handling into overlay, a few more nits fixed (#581 )	2021-11-12 11:19:28 -06:00
Wade Simmons	304b12f63f	create ConnectionState before adding to HostMap (#535 ) We have a few small race conditions with creating the HostInfo.ConnectionState since we add the host info to the pendingHostMap before we set this field. We can make everything a lot easier if we just add an "init" function so that we can set this field in the hostinfo before we add it to the hostmap.	2021-11-08 14:46:22 -05:00
Nate Brown	bcabcfdaca	Rework some things into packages (#489 )	2021-11-03 20:54:04 -05:00
Nate Brown	95f4c8a01b	Don't check for rebind if we are closing the tunnel (#457 )	2021-05-04 19:15:24 -05:00
Wade Simmons	44cb697552	Add more metrics (#450 ) * Add more metrics This change adds the following counter metrics: Metrics to track packets dropped at the firewall: firewall.dropped.local_ip firewall.dropped.remote_ip firewall.dropped.no_rule Metrics to track handshakes attempts that have been initiated and ones that have timed out (ones that have completed are tracked by the existing "handshakes" histogram). handshake_manager.initiated handshake_manager.timed_out Metrics to track when cached_packets are dropped because we run out of buffer space, and how many are sent once the handshake completes. hostinfo.cached_packets.dropped hostinfo.cached_packets.sent This change also notes how many cached packets we have when we log the final "Handshake received" message for either stage1 for stage2. * separate incoming/outgoing metrics * remove "allowed" firewall metrics We don't need this on the hotpath, they aren't worh it. * don't need pointers here	2021-04-27 22:23:18 -04:00
Nathan Brown	710df6a876	Refactor remotes and handshaking to give every address a fair shot (#437 )	2021-04-14 13:50:09 -05:00
Nathan Brown	64d8e5aa96	More LH cleanup (#429 )	2021-04-01 10:23:31 -05:00
Nathan Brown	75f7bda0a4	Lighthouse performance pass (#418 )	2021-03-31 17:32:02 -05:00
Nathan Brown	0c2e5973e1	Simple lie test (#427 )	2021-03-31 10:26:35 -05:00
Nathan Brown	883e09a392	Don't use a global ca pool (#426 )	2021-03-29 12:10:19 -05:00
Nathan Brown	3ea7e1b75f	Don't use a global logger (#423 )	2021-03-26 09:46:30 -05:00
Wade Simmons	64d8035d09	fix race in getOrHandshake (#400 ) We missed this race with #396 (and I think this is also the crash in issue #226). We need to lock a little higher in the getOrHandshake method, before we reset hostinfo.ConnectionInfo. Previously, two routines could enter this section and confuse the handshake process. This could result in the other side sending a recv_error that also has a race with setting hostinfo.ConnectionInfo back to nil. So we make sure to grab the lock in handleRecvError as well. Neither of these code paths are in the hot path (handling packets between two hosts over an active tunnel) so there should be no performance concerns.	2021-03-09 09:27:02 -05:00
Wade Simmons	d604270966	Fix most known data races (#396 ) This change fixes all of the known data races that `make smoke-docker-race` finds, except for one. Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages. Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field). The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now. here is the results of `make smoke-docker-race`: before: lighthouse1: Found 2 data race(s) host2: Found 36 data race(s) host3: Found 17 data race(s) host4: Found 31 data race(s) after: host2: Found 1 data race(s) host4: Found 1 data race(s) Fixes: #147 Fixes: #226 Fixes: #283 Fixes: #316	2021-03-05 21:18:33 -05:00
Nathan Brown	b6234abfb3	Add a way to trigger punch backs via lighthouse (#394 )	2021-03-01 19:06:01 -06:00
Wade Simmons	2a4beb41b9	Routine-local conntrack cache (#391 ) Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table. Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.	2021-03-01 19:52:17 -05:00
Wade Simmons	27d9a67dda	Proper multiqueue support for tun devices (#382 ) This change is for Linux only. Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues. To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction). Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged. Co-authored-by: Nate Brown <nbrown.us@gmail.com>	2021-02-25 15:01:14 -05:00
Darren Hoo	0010db46e4	Fix a data race on message counter (#284 ) 3. ================== WARNING: DATA RACE Write at 0x00c00030e020 by goroutine 17: sync/atomic.AddInt64() runtime/race_amd64.s:276 +0xb github.com/slackhq/nebula.(Interface).sendNoMetrics() github.com/slackhq/nebula/inside.go:226 +0x9c github.com/slackhq/nebula.(Interface).send() github.com/slackhq/nebula/inside.go:214 +0x149 github.com/slackhq/nebula.(Interface).readOutsidePackets() github.com/slackhq/nebula/outside.go:94 +0x1213 github.com/slackhq/nebula.(udpConn).ListenOut() github.com/slackhq/nebula/udp_generic.go:109 +0x3b5 github.com/slackhq/nebula.(Interface).listenOut() github.com/slackhq/nebula/interface.go:147 +0x15e Previous read at 0x00c00030e020 by goroutine 18: github.com/slackhq/nebula.(Interface).consumeInsidePacket() github.com/slackhq/nebula/inside.go:58 +0x892 github.com/slackhq/nebula.(*Interface).listenIn() github.com/slackhq/nebula/interface.go:164 +0x178	2020-09-21 21:41:46 -04:00
Wade Simmons	ac557f381b	drop unroutable packets (#267 ) Currently, if a packet arrives on the tun device with a destination that is not a routable Nebula IP, `queryUnsafeRoute` converts that IP to 0.0.0.0 and we store that packet and try to look up that IP with the lighthouse. This doesn't make any sense to do, if we get a packet that is unroutable we should just drop it. Note, we have a few configurable options like `drop_local_broadcast` and `drop_multicast` which do this for a few specific types, but since no packets like this will send correctly I think we should just drop anything that is unroutable.	2020-08-04 22:59:04 -04:00
Wade Simmons	a54f3fc681	fix fast handshake trigger for static hosts (#265 ) We are currently triggering a fast handshake for static hosts right inside HandshakeManager.AddVpnIP, but this can actually trigger before we have generated the handshake packet to use. Instead, we should be triggering right after we call ixHandshakeStage0 in getOrHandshake (which generates the handshake packet)	2020-08-02 20:59:50 -04:00
Wade Simmons	b37a91cfbc	add meta packet statistics (#230 ) This change add more metrics around "meta" (non "message" type packets). For lighthouse packets, we also record statistics around the specific lighthouse meta type. We don't keep statistics for the "message" type so that we don't slow down the fast path (and you can just look at metrics on the tun interface to find that information).	2020-06-26 13:45:48 -04:00
Patrick Bogen	ecf0e5a9f6	drop packets even if we aren't going to emit Debug logs about it (#239 ) * drop packets even if we aren't going to emit Debug logs about it * smallify change	2020-06-10 16:55:49 -05:00
Patrick Bogen	363c836422	log the reason for fw drops (#220 ) * log the reason for fw drops * only prepare log if we will end up sending it	2020-04-10 10:57:21 -07:00
Wade Simmons	b4f2f7ce4e	log `certName` alongside `vpnIp` (#200 ) This change adds a new helper, `(*HostInfo).logger()`, that starts a new logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside of handshake_ix though since the certificate has not been attached to the HostInfo yet. Fixes: #84	2020-04-06 11:34:00 -07:00
Ryan Huber	a91a40212d	check that packet isn't bound for my vpn ip (#192 )	2020-02-21 16:49:54 -08:00
Ryan Huber	9333a8e3b7	subnet support	2019-12-12 16:34:17 +00:00
Slack Security Team	f22b4b584d	Public Release	2019-11-19 17:00:20 +00:00

48 Commits