This is an attempt to unify all the station monitoring and make it work
better as one. We're trying to square a circle here somewhat, with taking
steps to kick nodes when problems are detected, but not kick them too quickly
or often in case we're mis-identifing issues.
We've seen these issue manifest themselves which nodes messing VoIP services
as well as resets causing nodes to get into unrecoverable states when there
was no real problems in the first place.
This will probably need to evolve before the next release, but would be good
to get some milage on the new code.
Coverage is handled by modifying firmware state, and the driver stores
the values the first time it is set. When we reset this state might be lost
so it will be reloaded from the firmware. We set the coverage back to 0
so the reloaded value will be the default again.
We also remove a check which can fail incorrectly.
* A scan, especially if we have to do both active and passive, essentially mutes
the radio to AREDN traffic for 10-20 seconds, which isn't good. If the radio is completely
deaf then it doesn't matter, but particularly on the 9K radios we do this when
things are looking a bit dodgy, though not deaf.
* Provide hook to reset ath9k from userspace. This hook is attributed to:
Linus Lüssing <ll@simonwunderlich.de>
* User /sys reset hooks rather than iw scan
* Make admin and user bar menus pluggable
* Realign header block to stop is moving around
* Remove ref
* Use modular nav to disable ineligable options during initial install
* Dont offer tunnel menus options when no tunnel daemon installed.
This is for low-memory devices
* Simplify
* Improve messaging when running ram image
* Disable rather than hide vpn menu items on tiny memory devices
* Move menu navs
* Use LQM information to filter out neighbors we dont care about.
These can cause false rejoin events and degrade the network.
* Only use active station monitor with LQM info.
* Resolve unresponsive node problems with Mikrotik AC devices.
Mikrotik AC devices get into a state where they wont communicate with
non-AC devices .. sometimes. Leaving and rejoinging the network resets
everything. We monitor for this situation and rejoin the network when detected
to resolve the issue.
* Make reporting less chatty
* General station monitor service.
It turns out this station bug is not limited to the ath10k driver, so
make this monitor service wifi generic.
(I've now seen this at both ends of the Mikrotik AC <-> Rocket pair)
* New logs
* Just monitor for now
There appears to be a bug in the ath10k firmware for Mikrotik devices (maybe others)
where a station will associate but only broadcast traffic will be passed - unicast traffic
will fail. This code detects this situation and forces the device to reassociate which
fixes the problem.
When people are using the filters on the mesh page, they sometimes
hit RETURN. This submit the page (which is a form) causing the page
to reload, lossing the filter, and as a side effect, setting the page
to auto-refresh.
* Track validation state of hosts and services. Only remove a host/service if it fails multiple times in a row.
* Let new addresses/services be valid for a while regardless
* Initially unknown addresses will be valid for a while
* Reset validation state when services updated
* Fix the bandwidth reporting for ath10k devices
* Use 'iw' for all TxMbps reporting.
As we cannot account for error rates in the ath10k driver, to be consistent
we now use the same system to retrieve tx rates for both ath9k and ath10k.
* Remove unused rate tables
On small networks there are not a lot of OLSR name changes. While
dnsmasq watches for changes and updates itself, it will sometimes miss
them. On busy networks this doesnt matter as the next change will catch
it up. But on smaller network (esp. test networks) a missed change can
stop name resolution working for some time. So now, if no changes are
detected for > 60 seconds, we force dnsmasq to reload its tables.
For some reason, there was code in the driver to block the setting of
the coverage when a previous setting wasn't a particular value.
It's unclear what this was trying to achieve or prevent, but it stopped AC
devices operating efficiently (by a factor of 10x or more).
This cryptic bit of shell script adds a maximum timeout for the iperf client
to run as it appears it can get stuck occasionally. The server has a built-in
timeout (not available in client mode)
* Exclude neighbor's neighbors which are non-routable.
If a neighbor node's neighbor is non-routable, then no traffic will
flow from it, so it's not hidden
* Use routable flag for exposed node detection