Commit Graph

65 Commits

Author SHA1 Message Date
Nate Brown 076ebc6c6e
Simplify getting a hostinfo or starting a handshake with one (#954) 2023-08-21 18:51:45 -05:00
Nate Brown 5a131b2975
Combine ca, cert, and key handling (#952) 2023-08-14 21:32:40 -05:00
Nate Brown 223cc6e660
Limit how often a busy tunnel can requery the lighthouse (#940)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-08-08 13:26:41 -05:00
Caleb Jasik ed00f5d530
Remove unused config code (last edited 4yrs ago) (#938) 2023-07-31 15:59:20 -05:00
Nate Brown 14d0106716
Send the lh update worker into its own routine instead of taking over the reload routine (#935) 2023-07-27 14:38:10 -05:00
Nate Brown a10baeee92
Pull hostmap and pending hostmap apart, remove unused functions (#843) 2023-07-24 12:37:52 -05:00
Nate Brown 3bbf5f4e67
Use an interface for udp conns (#901) 2023-06-14 10:48:52 -05:00
brad-defined bd9cc01d62
Dns static lookerupper (#796)
* Support lighthouse DNS names, and regularly resolve the name in a background goroutine to discover DNS updates.
2023-05-09 11:22:08 -04:00
Nate Brown 3cb4e0ef57
Allow listen.host to contain names (#825) 2023-04-05 11:29:26 -05:00
Nate Brown ee8e1348e9
Use connection manager to drive NAT maintenance (#835)
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
2023-03-31 15:45:05 -05:00
Tricia 0fc4d8192f
log network as String to match the other log event in interface.go that emits network (#811)
Co-authored-by: Tricia Bogen <tbogen@slack-corp.com>
2023-01-23 14:05:35 -05:00
Jon Rafkind c2259f14a7
explicitly reload config from ssh command (#725) 2022-08-08 12:44:09 -05:00
Wade Simmons 7b9287709c
add listen.send_recv_error config option (#670)
By default, Nebula replies to packets it has no tunnel for with a `recv_error` packet. This packet helps speed up re-connection
in the case that Nebula on either side did not shut down cleanly. This response can be abused as a way to discover if Nebula is running
on a host though. This option lets you configure if you want to send `recv_error` packets always, never, or only to private network remotes.
valid values: always, never, private

This setting is reloadable with SIGHUP.
2022-06-27 12:37:54 -04:00
brad-defined 1a7c575011
Relay (#678)
Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>
2022-06-21 13:35:23 -05:00
brad-defined 03498a0cb2
Make nebula advertise its dynamic port to lighthouses (#653) 2022-03-15 18:03:56 -05:00
Nate Brown 312a01dc09
Lighthouse reload support (#649)
Co-authored-by: John Maguire <contact@johnmaguire.me>
2022-03-14 12:35:13 -05:00
Wade Simmons befce3f990
fix crash with `-test` (#602)
When running in `-test` mode, `tun` is set to nil. So we should move the
defer into the `!configTest` if block.

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x54855c]

    goroutine 1 [running]:
    github.com/slackhq/nebula.Main.func3(0x4000135e80, {0x0, 0x0})
            github.com/slackhq/nebula/main.go:176 +0x2c
    github.com/slackhq/nebula.Main(0x400022e060, 0x1, {0x76faa0, 0x5}, 0x4000230000, 0x0)
            github.com/slackhq/nebula/main.go:316 +0x2414
    main.main()
            github.com/slackhq/nebula/cmd/nebula/main.go:54 +0x540
2021-12-06 14:06:16 -05:00
Nate Brown 48c47f5841
Warn if no lighthouses were configured on a non lighthouse node (#587) 2021-11-30 10:31:33 -06:00
Nate Brown 467e605d5e
Push route handling into overlay, a few more nits fixed (#581) 2021-11-12 11:19:28 -06:00
Nate Brown e07524a654
Move all of tun into overlay (#577) 2021-11-11 16:37:29 -06:00
Nate Brown 88ce0edf76
Start the overlay package with the old Inside interface (#576) 2021-11-10 21:52:26 -06:00
Nate Brown 4453964e34
Move util to test, contextual errors to util (#575) 2021-11-10 21:47:38 -06:00
Nate Brown bcabcfdaca
Rework some things into packages (#489) 2021-11-03 20:54:04 -05:00
brad-defined 6ae8ba26f7
Add a context object in nebula.Main to clean up on error (#550) 2021-11-02 13:14:26 -05:00
Donatas Abraitis 32e2619323
Teardown tunnel automatically if peer's certificate expired (#370) 2021-10-20 13:23:33 -05:00
Wade Simmons ea2c186a77
remote_allow_ranges: allow inside CIDR specific remote_allow_lists (#540)
This allows you to configure remote allow lists specific to different
subnets of the inside CIDR. Example:

    remote_allow_ranges:
      10.42.42.0/24:
        192.168.0.0/16: true

This would only allow hosts with a VPN IP in the 10.42.42.0/24 range to
have private IPs (and thus don't connect over public IPs).

The PR also refactors AllowList into RemoteAllowList and LocalAllowList to make it clearer which methods are allowed on which allow list.
2021-10-19 10:54:30 -04:00
brad-defined 7859140711
Only set serveDns if the host is also configured to be a lighthouse. (#433) 2021-04-16 13:33:56 -05:00
brad-defined 17106f83a0
Ensure the Nebula device exists before attempting to bind to the Nebula IP (#375) 2021-04-16 10:34:28 -05:00
Nathan Brown 710df6a876
Refactor remotes and handshaking to give every address a fair shot (#437) 2021-04-14 13:50:09 -05:00
Nathan Brown 1499be3e40
Fix name resolution for host names in config (#431) 2021-04-01 21:48:41 -05:00
Nathan Brown 64d8e5aa96
More LH cleanup (#429) 2021-04-01 10:23:31 -05:00
Nathan Brown 883e09a392
Don't use a global ca pool (#426) 2021-03-29 12:10:19 -05:00
Wade Simmons a71541fb0b
export build version as a prometheus label (#405)
This is how Prometheus recommends you do it, and how they do it
themselves in their client. This makes it easy to see which versions you
have deployed in your fleet, and query over it too.
2021-03-26 14:16:35 -04:00
Nathan Brown 3ea7e1b75f
Don't use a global logger (#423) 2021-03-26 09:46:30 -05:00
Nathan Brown 7073d204a8
IPv6 support for outside (udp) (#369) 2021-03-18 20:37:24 -05:00
Ryan Huber 73a5ed90b2
Do not allow someone to run a nebula lighthouse with an ephemeral port (#399)
* Do not allow someone to run a nebula lighthouse with an ephemeral port

* derp - we discover the port so we have to check the config setting

* No context needed for this error

* gofmt yourself

* Revert "gofmt yourself"

This reverts commit c01423498e.

* Revert "No context needed for this error"

This reverts commit 6792af6846.

* snip snap snip snap
2021-03-08 12:42:06 -08:00
Nathan Brown b6234abfb3
Add a way to trigger punch backs via lighthouse (#394) 2021-03-01 19:06:01 -06:00
Wade Simmons 2a4beb41b9
Routine-local conntrack cache (#391)
Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table.

Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.
2021-03-01 19:52:17 -05:00
Wade Simmons a0583ebdca
tun_disabled: reply to ICMP Echo Request (#342)
This change allows a server running with `tun.disabled: true` (usually
a lighthouse) to still reply to ICMP EchoRequest packets. This allows
you to "ping" the lighthouse Nebula IP as a quick check to make sure the
tunnel is up, even when running with tun.disabled.

This is still gated by allowing `icmp` packets in the inbound firewall
rules.
2021-03-01 11:09:41 -05:00
Wade Simmons 27d9a67dda
Proper multiqueue support for tun devices (#382)
This change is for Linux only.

Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues.

To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction).

Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged.

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2021-02-25 15:01:14 -05:00
Nathan Brown 68e3e84fdc
More like a library (#279) 2020-09-18 09:20:09 -05:00
forfuncsake 9b8b3c478b
Support startup without a tun device (#269)
This commit adds support for Nebula to be started without creating
a tun device. A node started in this mode still has a full "control
plane", but no effective "data plane". Its use is suited to a
lighthouse that has no need to partake in the mesh VPN.

Consequently, creation of the tun device is the only reason nebula
neesd to be started with elevated privileged, so this example
lighthouse can also be run as a non-root user.
2020-08-10 09:15:55 -04:00
Wade Simmons 4756c9613d
trigger handshakes when lighthouse reply arrives (#246)
Currently, we wait until the next timer tick to act on the lighthouse's
reply to our HostQuery. This means we can easily add hundreds of
milliseconds of unnecessary delay to the handshake. To fix this, we
can introduce a channel to trigger an outbound handshake without waiting
for the next timer tick.

A few samples of cold ping time between two hosts that require a
lighthouse lookup:

    before (v1.2.0):

    time=156 ms
    time=252 ms
    time=12.6 ms
    time=301 ms
    time=352 ms
    time=49.4 ms
    time=150 ms
    time=13.5 ms
    time=8.24 ms
    time=161 ms
    time=355 ms

    after:

    time=3.53 ms
    time=3.14 ms
    time=3.08 ms
    time=3.92 ms
    time=7.78 ms
    time=3.59 ms
    time=3.07 ms
    time=3.22 ms
    time=3.12 ms
    time=3.08 ms
    time=8.04 ms

I recommend reviewing this PR by looking at each commit individually, as
some refactoring was required that makes the diff a bit confusing when
combined together.
2020-07-22 10:35:10 -04:00
Wade Simmons aba42f9fa6
enforce the use of goimports (#248)
* enforce the use of goimports

Instead of enforcing `gofmt`, enforce `goimports`, which also asserts
a separate section for non-builtin packages.

* run `goimports` everywhere

* exclude generated .pb.go files
2020-06-30 18:53:30 -04:00
Nathan Brown 41578ca971
Be more like a library to support mobile (#247) 2020-06-30 13:48:58 -05:00
Wade Simmons b37a91cfbc
add meta packet statistics (#230)
This change add more metrics around "meta" (non "message" type packets).
For lighthouse packets, we also record statistics around the specific
lighthouse meta type.

We don't keep statistics for the "message" type so that we don't slow
down the fast path (and you can just look at metrics on the tun
interface to find that information).
2020-06-26 13:45:48 -04:00
Wade Simmons 4f6313ebd3
fix config name for {remote,local}_allow_list (#219)
This config option should be snake_case, not camelCase.
2020-04-08 16:20:12 -04:00
Wade Simmons 0a474e757b
Add lighthouse.{remoteAllowList,localAllowList} (#217)
These settings make it possible to blacklist / whitelist IP addresses
that are used for remote connections.

`lighthouse.remoteAllowList` filters which remote IPs are allow when
fetching from the lighthouse (or, if you are the lighthouse, which IPs
you store and forward to querying hosts). By default, any remote IPs are
allowed. You can provide CIDRs here with `true` to allow and `false` to
deny. The most specific CIDR rule applies to each remote.  If all rules
are "allow", the default will be "deny", and vice-versa. If both "allow"
and "deny" rules are present, then you MUST set a rule for "0.0.0.0/0"
as the default.

    lighthouse:
      remoteAllowList:
        # Example to block IPs from this subnet from being used for remote IPs.
        "172.16.0.0/12": false

        # A more complicated example, allow public IPs but only private IPs from a specific subnet
        "0.0.0.0/0": true
        "10.0.0.0/8": false
        "10.42.42.0/24": true

`lighthouse.localAllowList` has the same logic as above, but it applies
to the local addresses we advertise to the lighthouse. Additionally, you
can specify an `interfaces` map of regular expressions to match against
interface names. The regexp must match the entire name. All interface
rules must be either true or false (and the default rule will be the
inverse). CIDR rules are matched after interface name rules.

Default is all local IP addresses.

    lighthouse:
      localAllowList:
        # Example to blacklist docker interfaces.
        interfaces:
          'docker.*': false

        # Example to only advertise IPs in this subnet to the lighthouse.
        "10.0.0.0/8": true
2020-04-08 15:36:43 -04:00
Wade Simmons 7cdbb14a18
Better config test (#177)
* Better config test

Previously, when using the config test option `-test`, we quit fairly
earlier in the process and would not catch a variety of additional
parsing errors (such as lighthouse IP addresses, local_range, the new
check to make sure static hosts are in the certificate's subnet, etc).

* run config test as part of smoke test

* don't need privileges for configtest

Co-authored-by: Nathan Brown <nate@slack-corp.com>
2020-04-06 11:35:32 -07:00
Felix Yan 9e2ff7df57
Correct typos in noise.go (#205) 2020-03-30 11:23:55 -07:00