This is an attempt to unify all the station monitoring and make it work
better as one. We're trying to square a circle here somewhat, with taking
steps to kick nodes when problems are detected, but not kick them too quickly
or often in case we're mis-identifing issues.
We've seen these issue manifest themselves which nodes messing VoIP services
as well as resets causing nodes to get into unrecoverable states when there
was no real problems in the first place.
This will probably need to evolve before the next release, but would be good
to get some milage on the new code.
Coverage is handled by modifying firmware state, and the driver stores
the values the first time it is set. When we reset this state might be lost
so it will be reloaded from the firmware. We set the coverage back to 0
so the reloaded value will be the default again.
We also remove a check which can fail incorrectly.
* A scan, especially if we have to do both active and passive, essentially mutes
the radio to AREDN traffic for 10-20 seconds, which isn't good. If the radio is completely
deaf then it doesn't matter, but particularly on the 9K radios we do this when
things are looking a bit dodgy, though not deaf.
* Provide hook to reset ath9k from userspace. This hook is attributed to:
Linus Lüssing <ll@simonwunderlich.de>
* User /sys reset hooks rather than iw scan
* Use LQM information to filter out neighbors we dont care about.
These can cause false rejoin events and degrade the network.
* Only use active station monitor with LQM info.
* Resolve unresponsive node problems with Mikrotik AC devices.
Mikrotik AC devices get into a state where they wont communicate with
non-AC devices .. sometimes. Leaving and rejoinging the network resets
everything. We monitor for this situation and rejoin the network when detected
to resolve the issue.
* Make reporting less chatty
* General station monitor service.
It turns out this station bug is not limited to the ath10k driver, so
make this monitor service wifi generic.
(I've now seen this at both ends of the Mikrotik AC <-> Rocket pair)
* New logs
* Just monitor for now
There appears to be a bug in the ath10k firmware for Mikrotik devices (maybe others)
where a station will associate but only broadcast traffic will be passed - unicast traffic
will fail. This code detects this situation and forces the device to reassociate which
fixes the problem.
* Track validation state of hosts and services. Only remove a host/service if it fails multiple times in a row.
* Let new addresses/services be valid for a while regardless
* Initially unknown addresses will be valid for a while
* Reset validation state when services updated
On small networks there are not a lot of OLSR name changes. While
dnsmasq watches for changes and updates itself, it will sometimes miss
them. On busy networks this doesnt matter as the next change will catch
it up. But on smaller network (esp. test networks) a missed change can
stop name resolution working for some time. So now, if no changes are
detected for > 60 seconds, we force dnsmasq to reload its tables.
For some reason, there was code in the driver to block the setting of
the coverage when a previous setting wasn't a particular value.
It's unclear what this was trying to achieve or prevent, but it stopped AC
devices operating efficiently (by a factor of 10x or more).
* Exclude neighbor's neighbors which are non-routable.
If a neighbor node's neighbor is non-routable, then no traffic will
flow from it, so it's not hidden
* Use routable flag for exposed node detection
* Enable RTS/CTS when we detect hidden nodes
* Only change rts setting when we need to
* RTS advanced config option
* Include neighbors blocked neighbors (they still transmit)
* Bump default RTS threashold
* Report list of hidden node rather than yes/no
* Canonical hostnames
* When we enable RTS, enable it for all traffic by default
* Show hidden neighbors in display
* Default RTS threshold to -1 (always off)
The connec timeout did not include DNS looksup, and if DNS is broken this can hang forever. Add
a maxmimum timeout so this call will eventually terminate regardless.
When a tunnel is idle, binding to the tun* device fails; so remove it.
As we have a direct tunnel route in the routing table (not OLSR table 30)
created by vtun, we will still correctly route the quality testing traffic.