The arp cache keeps wifi entries long past them being associated with
the node, so now use wifi assoc list to find nodes, and the arp cache
to get their IPs.
This is an attempt to unify all the station monitoring and make it work
better as one. We're trying to square a circle here somewhat, with taking
steps to kick nodes when problems are detected, but not kick them too quickly
or often in case we're mis-identifing issues.
We've seen these issue manifest themselves which nodes messing VoIP services
as well as resets causing nodes to get into unrecoverable states when there
was no real problems in the first place.
This will probably need to evolve before the next release, but would be good
to get some milage on the new code.
Coverage is handled by modifying firmware state, and the driver stores
the values the first time it is set. When we reset this state might be lost
so it will be reloaded from the firmware. We set the coverage back to 0
so the reloaded value will be the default again.
We also remove a check which can fail incorrectly.
* A scan, especially if we have to do both active and passive, essentially mutes
the radio to AREDN traffic for 10-20 seconds, which isn't good. If the radio is completely
deaf then it doesn't matter, but particularly on the 9K radios we do this when
things are looking a bit dodgy, though not deaf.
* Provide hook to reset ath9k from userspace. This hook is attributed to:
Linus Lüssing <ll@simonwunderlich.de>
* User /sys reset hooks rather than iw scan
* Use LQM information to filter out neighbors we dont care about.
These can cause false rejoin events and degrade the network.
* Only use active station monitor with LQM info.
* Resolve unresponsive node problems with Mikrotik AC devices.
Mikrotik AC devices get into a state where they wont communicate with
non-AC devices .. sometimes. Leaving and rejoinging the network resets
everything. We monitor for this situation and rejoin the network when detected
to resolve the issue.
* Make reporting less chatty
* General station monitor service.
It turns out this station bug is not limited to the ath10k driver, so
make this monitor service wifi generic.
(I've now seen this at both ends of the Mikrotik AC <-> Rocket pair)
* New logs
* Just monitor for now
There appears to be a bug in the ath10k firmware for Mikrotik devices (maybe others)
where a station will associate but only broadcast traffic will be passed - unicast traffic
will fail. This code detects this situation and forces the device to reassociate which
fixes the problem.
* Track validation state of hosts and services. Only remove a host/service if it fails multiple times in a row.
* Let new addresses/services be valid for a while regardless
* Initially unknown addresses will be valid for a while
* Reset validation state when services updated
On small networks there are not a lot of OLSR name changes. While
dnsmasq watches for changes and updates itself, it will sometimes miss
them. On busy networks this doesnt matter as the next change will catch
it up. But on smaller network (esp. test networks) a missed change can
stop name resolution working for some time. So now, if no changes are
detected for > 60 seconds, we force dnsmasq to reload its tables.
For some reason, there was code in the driver to block the setting of
the coverage when a previous setting wasn't a particular value.
It's unclear what this was trying to achieve or prevent, but it stopped AC
devices operating efficiently (by a factor of 10x or more).
* Exclude neighbor's neighbors which are non-routable.
If a neighbor node's neighbor is non-routable, then no traffic will
flow from it, so it's not hidden
* Use routable flag for exposed node detection