nebula

Commit Graph

Author	SHA1	Message	Date
Nate Brown	92cc32f844	Remove handshake race avoidance (#820 ) Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2023-03-13 12:35:14 -05:00
Nate Brown	a06977bbd5	Track connections by local index id instead of vpn ip (#807 )	2023-02-13 14:41:05 -06:00
Wade Simmons	9af242dc47	switch to new sync/atomic helpers in go1.19 (#728 ) These new helpers make the code a lot cleaner. I confirmed that the simple helpers like `atomic.Int64` don't add any extra overhead as they get inlined by the compiler. `atomic.Pointer` adds an extra method call as it no longer gets inlined, but we aren't using these on the hot path so it is probably okay.	2022-10-31 13:37:41 -04:00
brad-defined	1a7c575011	Relay (#678 ) Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>	2022-06-21 13:35:23 -05:00
Wade Simmons	949ec78653	don't set ConnectionState to nil (#590 ) * don't set ConnectionState to nil We might have packets processing in another thread, so we can't safely just set this to nil. Since we removed it from the hostmaps, the next packets to process should start the handshake over again. I believe this comment is outdated or incorrect, since the next handshake will start over with a new HostInfo, I don't think there is any way a counter reuse could happen: > We must null the connectionstate or a counter reuse may happen Here is a panic we saw that I think is related: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x93a037] goroutine 59 [running, locked to thread]: github.com/slackhq/nebula.(Firewall).Drop(...) github.com/slackhq/nebula/firewall.go:380 github.com/slackhq/nebula.(Interface).consumeInsidePacket(...) github.com/slackhq/nebula/inside.go:59 github.com/slackhq/nebula.(Interface).listenIn(...) github.com/slackhq/nebula/interface.go:233 created by github.com/slackhq/nebula.(Interface).run github.com/slackhq/nebula/interface.go:191 * use closeTunnel	2021-12-06 14:09:05 -05:00
Nate Brown	467e605d5e	Push route handling into overlay, a few more nits fixed (#581 )	2021-11-12 11:19:28 -06:00
Nate Brown	e07524a654	Move all of tun into overlay (#577 )	2021-11-11 16:37:29 -06:00
Wade Simmons	304b12f63f	create ConnectionState before adding to HostMap (#535 ) We have a few small race conditions with creating the HostInfo.ConnectionState since we add the host info to the pendingHostMap before we set this field. We can make everything a lot easier if we just add an "init" function so that we can set this field in the hostinfo before we add it to the hostmap.	2021-11-08 14:46:22 -05:00
Nate Brown	bcabcfdaca	Rework some things into packages (#489 )	2021-11-03 20:54:04 -05:00
brad-defined	6ae8ba26f7	Add a context object in nebula.Main to clean up on error (#550 )	2021-11-02 13:14:26 -05:00
Wade Simmons	ea2c186a77	remote_allow_ranges: allow inside CIDR specific remote_allow_lists (#540 ) This allows you to configure remote allow lists specific to different subnets of the inside CIDR. Example: remote_allow_ranges: 10.42.42.0/24: 192.168.0.0/16: true This would only allow hosts with a VPN IP in the 10.42.42.0/24 range to have private IPs (and thus don't connect over public IPs). The PR also refactors AllowList into RemoteAllowList and LocalAllowList to make it clearer which methods are allowed on which allow list.	2021-10-19 10:54:30 -04:00
Wade Simmons	ae5505bc74	handshake: update to preferred remote (#532 ) If we receive a handshake packet for a tunnel that has already been completed, check to see if the new remote is preferred. If so, update to the preferred remote and send a test packet to influence the other side to do the same.	2021-10-19 10:53:55 -04:00
Nate Brown	d004fae4f9	Unlock the hostmap quickly, lock hostinfo instead (#459 )	2021-05-05 13:10:55 -05:00
Wade Simmons	44cb697552	Add more metrics (#450 ) * Add more metrics This change adds the following counter metrics: Metrics to track packets dropped at the firewall: firewall.dropped.local_ip firewall.dropped.remote_ip firewall.dropped.no_rule Metrics to track handshakes attempts that have been initiated and ones that have timed out (ones that have completed are tracked by the existing "handshakes" histogram). handshake_manager.initiated handshake_manager.timed_out Metrics to track when cached_packets are dropped because we run out of buffer space, and how many are sent once the handshake completes. hostinfo.cached_packets.dropped hostinfo.cached_packets.sent This change also notes how many cached packets we have when we log the final "Handshake received" message for either stage1 for stage2. * separate incoming/outgoing metrics * remove "allowed" firewall metrics We don't need this on the hotpath, they aren't worh it. * don't need pointers here	2021-04-27 22:23:18 -04:00
Nathan Brown	db23fdf9bc	Dont apply race avoidance to existing handshakes, use the handshake time to determine who wins (#451 ) Co-authored-by: Wade Simmons <wadey@slack-corp.com>	2021-04-27 21:15:34 -05:00
Nathan Brown	710df6a876	Refactor remotes and handshaking to give every address a fair shot (#437 )	2021-04-14 13:50:09 -05:00
Nathan Brown	480036fbc8	Remove unused structs in hostmap.go (#430 )	2021-04-01 22:07:11 -05:00
Nathan Brown	0c2e5973e1	Simple lie test (#427 )	2021-03-31 10:26:35 -05:00
Wade Simmons	4603b5b2dd	fix PromoteEvery check (#424 ) This check was accidentally typo'd in #396 from `%` to `&`. Restore the correct functionality here (we want to do the check every "PromoteEvery" count packets).	2021-03-26 15:01:05 -04:00
Nathan Brown	3ea7e1b75f	Don't use a global logger (#423 )	2021-03-26 09:46:30 -05:00
Nathan Brown	7a9f9dbded	Don't craft buffers if we don't need them (#416 )	2021-03-22 18:25:06 -05:00
Nathan Brown	7073d204a8	IPv6 support for outside (udp) (#369 )	2021-03-18 20:37:24 -05:00
Wade Simmons	6c55d67f18	Refactor handshake_ix (#401 ) There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake. Co-authored-by: Nate Brown <nbrown.us@gmail.com> Co-authored-by: Ryan Huber <rhuber@gmail.com> Co-authored-by: forfuncsake <drussell@slack-corp.com>	2021-03-12 14:16:25 -05:00
Wade Simmons	d604270966	Fix most known data races (#396 ) This change fixes all of the known data races that `make smoke-docker-race` finds, except for one. Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages. Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field). The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now. here is the results of `make smoke-docker-race`: before: lighthouse1: Found 2 data race(s) host2: Found 36 data race(s) host3: Found 17 data race(s) host4: Found 31 data race(s) after: host2: Found 1 data race(s) host4: Found 1 data race(s) Fixes: #147 Fixes: #226 Fixes: #283 Fixes: #316	2021-03-05 21:18:33 -05:00
Nathan Brown	b6234abfb3	Add a way to trigger punch backs via lighthouse (#394 )	2021-03-01 19:06:01 -06:00
Wade Simmons	1bae5b2550	more validation in pending hostmap deletes (#344 ) We are currently seeing some cases where we are not deleting entries correctly from the pending hostmap. I believe this is a case of an inbound timer tick firing and deleting the Hosts map entry for a newer handshake attempt than intended, thus leaving the old Indexes entry orphaned. This change adds some extra checking when deleteing from the Indexes and Hosts maps to ensure we clean everything up correctly.	2021-03-01 12:40:46 -05:00
Tim Rots	e7e6a23cde	fix a few typos (#302 )	2021-03-01 11:14:34 -05:00
Wade Simmons	ee7c27093c	add HostMap.RemoteIndexes (#329 ) This change adds an index based on HostInfo.remoteIndexId. This allows us to use HostMap.QueryReverseIndex without having to loop over all entries in the map (this can be a bottleneck under high traffic lighthouses). Without this patch, a high traffic lighthouse server receiving recv_error packets and lots of handshakes, cpu pprof trace can look like this: flat flat% sum% cum cum% 2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex 870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext Which shows 50% of total cpu time is being spent in QueryReverseIndex.	2020-11-23 14:51:16 -05:00
Ryan Huber	43a3988afc	i don't think this is used at all anymore (#323 )	2020-10-29 21:43:50 -04:00
Wade Simmons	b37a91cfbc	add meta packet statistics (#230 ) This change add more metrics around "meta" (non "message" type packets). For lighthouse packets, we also record statistics around the specific lighthouse meta type. We don't keep statistics for the "message" type so that we don't slow down the fast path (and you can just look at metrics on the tun interface to find that information).	2020-06-26 13:45:48 -04:00
Wade Simmons	0a474e757b	Add lighthouse.{remoteAllowList,localAllowList} (#217 ) These settings make it possible to blacklist / whitelist IP addresses that are used for remote connections. `lighthouse.remoteAllowList` filters which remote IPs are allow when fetching from the lighthouse (or, if you are the lighthouse, which IPs you store and forward to querying hosts). By default, any remote IPs are allowed. You can provide CIDRs here with `true` to allow and `false` to deny. The most specific CIDR rule applies to each remote. If all rules are "allow", the default will be "deny", and vice-versa. If both "allow" and "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the default. lighthouse: remoteAllowList: # Example to block IPs from this subnet from being used for remote IPs. "172.16.0.0/12": false # A more complicated example, allow public IPs but only private IPs from a specific subnet "0.0.0.0/0": true "10.0.0.0/8": false "10.42.42.0/24": true `lighthouse.localAllowList` has the same logic as above, but it applies to the local addresses we advertise to the lighthouse. Additionally, you can specify an `interfaces` map of regular expressions to match against interface names. The regexp must match the entire name. All interface rules must be either true or false (and the default rule will be the inverse). CIDR rules are matched after interface name rules. Default is all local IP addresses. lighthouse: localAllowList: # Example to blacklist docker interfaces. interfaces: 'docker.*': false # Example to only advertise IPs in this subnet to the lighthouse. "10.0.0.0/8": true	2020-04-08 15:36:43 -04:00
Wade Simmons	b4f2f7ce4e	log `certName` alongside `vpnIp` (#200 ) This change adds a new helper, `(*HostInfo).logger()`, that starts a new logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside of handshake_ix though since the certificate has not been attached to the HostInfo yet. Fixes: #84	2020-04-06 11:34:00 -07:00
Wade Simmons	add1b21777	only create a CIDRTree for each host if necessary (#198 ) A CIDRTree can be expensive to create, so only do it if we need it. If the remote host only has one IP address and no subnets, just do an exact IP match instead. Fixes: #171	2020-03-02 16:21:33 -05:00
Ryan Huber	ad7079d370	make this a warning, even though i believe it is fundamentally an error (in judgement)	2019-12-13 21:55:01 +00:00
Ryan Huber	9333a8e3b7	subnet support	2019-12-12 16:34:17 +00:00
Slack Security Team	f22b4b584d	Public Release	2019-11-19 17:00:20 +00:00

36 Commits