Commit Graph

56 Commits

Author SHA1 Message Date
Patrick Cloke 77dfc1f939
Fix a bug where servers could be marked as up when they were failing (#16506)
After this change a server will only be reported as back online
if they were previously having requests fail.
2023-10-17 07:32:40 -04:00
Erik Johnston d35bed8369
Don't wake up destination transaction queue if they're not due for retry. (#16223) 2023-09-04 17:14:09 +01:00
Erik Johnston f84baecb6f
Don't reset retry timers on "valid" error codes (#16221) 2023-09-04 14:04:43 +01:00
Mathieu Velten f0a860908b
Allow config of the backoff algorithm for the federation client. (#15754)
Adds three new configuration variables:

* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
  be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.

Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
2023-08-03 14:36:55 -04:00
Eric Eastwood 40fa8294e3
Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull from `destination` pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Sean Quah 2be5a2b07b
Fix `RetryDestinationLimiter` re-starting finished log contexts (#12803)
Signed-off-by: Sean Quah <seanq@matrix.org>
2022-05-19 20:17:10 +01:00
Erik Johnston 8dd3e0e084
Immediately retry any requests that have backed off when a server comes back online. (#12500)
Otherwise it can take up to a minute for any in-flight `/send` requests to be retried.
2022-05-10 10:39:54 +01:00
David Robertson a2b00a4486
Bump `black` and `click` versions (#12320) 2022-03-29 10:41:19 +00:00
reivilibre 524b8ead77
Add types to synapse.util. (#10601) 2021-09-10 17:03:18 +01:00
Erik Johnston 3e831f24ff
Don't hammer the database for destination retry timings every ~5mins (#10036) 2021-05-21 17:57:08 +01:00
Jonathan de Jong 4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Dan Callahan aff1eb7c67
Tell Black to format code for Python 3.5 (#8664)
This allows trailing commas in multi-line arg lists.

Minor, but we might as well keep our formatting current with regard to
our minimum supported Python version.

Signed-off-by: Dan Callahan <danc@element.io>
2020-10-27 23:26:36 +00:00
Patrick Cloke 8a4a4186de
Simplify super() calls to Python 3 syntax. (#8344)
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Patrick Cloke c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Patrick Cloke fe6cfc80ec
Convert some util functions to async (#8035) 2020-08-06 08:39:35 -04:00
Patrick Cloke 38e1fac886
Fix some spelling mistakes / typos. (#7811) 2020-07-09 09:52:58 -04:00
Erik Johnston f44f1d2e83 Fix errors storing large retry intervals.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.

To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
2019-10-02 10:36:27 +01:00
Richard van der Hoff 1e19ce00bf
Add 'failure_ts' column to 'destinations' table (#6016)
Track the time that a server started failing at, for general analysis purposes.
2019-09-17 11:41:54 +01:00
Richard van der Hoff 3d882a7ba5
Remove the cap on federation retry interval. (#6026)
Essentially the intention here is to end up blacklisting servers which never
respond to federation requests.

Fixes https://github.com/matrix-org/synapse/issues/5113.
2019-09-12 13:00:13 +01:00
Richard van der Hoff 0388beafe4
Fix bug in calculating the federation retry backoff period (#6025)
This was intended to introduce an element of jitter; instead it gave you a
30/60 chance of resetting to zero.
2019-09-12 12:59:43 +01:00
Richard van der Hoff 7902bf1e1d
Clean up some code in the retry logic (#6017)
* remove some unused code
* make things which were constants into constants for efficiency and clarity
2019-09-11 15:14:56 +01:00
Amber Brown 4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Amber Brown 463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Richard van der Hoff aa530e6800
Call RetryLimiter correctly (#5340)
Fixes a regression introduced in #5335.
2019-06-04 22:02:53 +01:00
Richard van der Hoff dce6e9e0c1 Avoid rapidly backing-off a server if we ignore the retry interval 2019-06-03 23:58:42 +01:00
Richard van der Hoff 642199570c
Improve the logging when handling a federation transaction (#3904)
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:

- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
  them into other, more cryptic, exceptions.
2018-09-19 17:28:18 +01:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Richard van der Hoff 2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Matthew Hodgson ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff 9397edb28b Merge pull request #2050 from matrix-org/rav/federation_backoff
push federation retry limiter down to matrixfederationclient
2017-03-23 22:27:01 +00:00
Richard van der Hoff 5a16cb4bf0 Ignore backoff history for invites, aliases, and roomdirs
Add a param to the federation client which lets us ignore historical backoff
data for federation queries, and set it for a handful of operations.
2017-03-23 12:23:22 +00:00
Richard van der Hoff 4bd597d9fc push federation retry limiter down to matrixfederationclient
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
2017-03-23 09:28:46 +00:00
Richard van der Hoff 19b9366d73 Fix a couple of logcontext leaks
Use preserve_fn to correctly manage the logcontexts around things we don't want
to yield on.
2017-03-23 00:17:46 +00:00
Erik Johnston df4ecff5a9 Correctly raise exceptions for ratelimitng. Ratelimit on 401 2017-02-01 15:42:19 +00:00
Erik Johnston fe08db2713 Remove explicit < 400 check as apparently this is confusing 2017-01-31 15:21:32 +00:00
Erik Johnston 4c0ec15bdc Comment 2017-01-31 13:53:46 +00:00
Erik Johnston 85c590105f Comment 2017-01-31 13:46:38 +00:00
Erik Johnston ae7a132f38 Better handle 404 response for federation /send/ 2017-01-31 13:40:09 +00:00
Erik Johnston 11bfe438a2 Use correct var 2016-11-24 15:26:53 +00:00
Erik Johnston aaecffba3a Correctly handle 500's and 429 on federation 2016-11-24 15:04:49 +00:00
Erik Johnston 90565d015e Invalidate retry cache in both directions 2016-11-22 17:45:44 +00:00
Mark Haines bf81e38d36 Fix retry utils to check if the exception is a subclass of CME 2016-07-28 10:47:02 +01:00
Matthew Hodgson 6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Erik Johnston eacb068ac2 Retry dead servers a lot less often 2015-11-02 16:56:30 +00:00
Erik Johnston 9236136f3a Make work in both Maria and SQLite. Fix tests 2015-04-01 14:12:33 +01:00
Erik Johnston cc3d3babb0 Remove unused import 2015-02-18 12:01:41 +00:00
Erik Johnston 36e144091b Remove spurious comma. Remove temp run_on_reactor 2015-02-18 11:25:20 +00:00
Erik Johnston b17bd31da0 Temporarily add a run_on_reactor() call 2015-02-18 11:17:26 +00:00
Erik Johnston 859fbd4423 s/self._clock/self.clock/ 2015-02-18 10:39:14 +00:00