Commit Graph

40 Commits

Author SHA1 Message Date
Erik Johnston f44f1d2e83 Fix errors storing large retry intervals.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.

To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
2019-10-02 10:36:27 +01:00
Richard van der Hoff 1e19ce00bf
Add 'failure_ts' column to 'destinations' table (#6016)
Track the time that a server started failing at, for general analysis purposes.
2019-09-17 11:41:54 +01:00
Richard van der Hoff 3d882a7ba5
Remove the cap on federation retry interval. (#6026)
Essentially the intention here is to end up blacklisting servers which never
respond to federation requests.

Fixes https://github.com/matrix-org/synapse/issues/5113.
2019-09-12 13:00:13 +01:00
Richard van der Hoff 0388beafe4
Fix bug in calculating the federation retry backoff period (#6025)
This was intended to introduce an element of jitter; instead it gave you a
30/60 chance of resetting to zero.
2019-09-12 12:59:43 +01:00
Richard van der Hoff 7902bf1e1d
Clean up some code in the retry logic (#6017)
* remove some unused code
* make things which were constants into constants for efficiency and clarity
2019-09-11 15:14:56 +01:00
Amber Brown 4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Amber Brown 463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Richard van der Hoff aa530e6800
Call RetryLimiter correctly (#5340)
Fixes a regression introduced in #5335.
2019-06-04 22:02:53 +01:00
Richard van der Hoff dce6e9e0c1 Avoid rapidly backing-off a server if we ignore the retry interval 2019-06-03 23:58:42 +01:00
Richard van der Hoff 642199570c
Improve the logging when handling a federation transaction (#3904)
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:

- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
  them into other, more cryptic, exceptions.
2018-09-19 17:28:18 +01:00
Amber Brown 49af402019 run isort 2018-07-09 16:09:20 +10:00
Richard van der Hoff 2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Matthew Hodgson ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff 9397edb28b Merge pull request #2050 from matrix-org/rav/federation_backoff
push federation retry limiter down to matrixfederationclient
2017-03-23 22:27:01 +00:00
Richard van der Hoff 5a16cb4bf0 Ignore backoff history for invites, aliases, and roomdirs
Add a param to the federation client which lets us ignore historical backoff
data for federation queries, and set it for a handful of operations.
2017-03-23 12:23:22 +00:00
Richard van der Hoff 4bd597d9fc push federation retry limiter down to matrixfederationclient
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
2017-03-23 09:28:46 +00:00
Richard van der Hoff 19b9366d73 Fix a couple of logcontext leaks
Use preserve_fn to correctly manage the logcontexts around things we don't want
to yield on.
2017-03-23 00:17:46 +00:00
Erik Johnston df4ecff5a9 Correctly raise exceptions for ratelimitng. Ratelimit on 401 2017-02-01 15:42:19 +00:00
Erik Johnston fe08db2713 Remove explicit < 400 check as apparently this is confusing 2017-01-31 15:21:32 +00:00
Erik Johnston 4c0ec15bdc Comment 2017-01-31 13:53:46 +00:00
Erik Johnston 85c590105f Comment 2017-01-31 13:46:38 +00:00
Erik Johnston ae7a132f38 Better handle 404 response for federation /send/ 2017-01-31 13:40:09 +00:00
Erik Johnston 11bfe438a2 Use correct var 2016-11-24 15:26:53 +00:00
Erik Johnston aaecffba3a Correctly handle 500's and 429 on federation 2016-11-24 15:04:49 +00:00
Erik Johnston 90565d015e Invalidate retry cache in both directions 2016-11-22 17:45:44 +00:00
Mark Haines bf81e38d36 Fix retry utils to check if the exception is a subclass of CME 2016-07-28 10:47:02 +01:00
Matthew Hodgson 6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Erik Johnston eacb068ac2 Retry dead servers a lot less often 2015-11-02 16:56:30 +00:00
Erik Johnston 9236136f3a Make work in both Maria and SQLite. Fix tests 2015-04-01 14:12:33 +01:00
Erik Johnston cc3d3babb0 Remove unused import 2015-02-18 12:01:41 +00:00
Erik Johnston 36e144091b Remove spurious comma. Remove temp run_on_reactor 2015-02-18 11:25:20 +00:00
Erik Johnston b17bd31da0 Temporarily add a run_on_reactor() call 2015-02-18 11:17:26 +00:00
Erik Johnston 859fbd4423 s/self._clock/self.clock/ 2015-02-18 10:39:14 +00:00
Erik Johnston 4fd176a41d More docs 2015-02-18 10:11:24 +00:00
Erik Johnston d77912ff44 Docs. 2015-02-18 10:09:54 +00:00
Erik Johnston 9371019133 Try to only back off if we think we failed to connect to the remote 2015-02-17 18:13:34 +00:00
Erik Johnston c8436b38a0 Only update destination_retry_timings if we have succeeded when retrying 2015-02-17 17:38:38 +00:00
Erik Johnston f91263b1e0 Remove spurious self 2015-02-17 17:37:51 +00:00
Erik Johnston 2b8f1a956c Add per server retry limiting.
Factor out the pre destination retry logic from TransactionQueue so it
can be reused in both get_pdu and crypto.keyring
2015-02-17 17:20:56 +00:00