Commit Graph

315 Commits

Author SHA1 Message Date
Richard van der Hoff 297bf2547e
Fix sync bug when accepting invites (#4956)
Hopefully this time we really will fix #4422.

We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
2019-04-02 12:42:39 +01:00
Richard van der Hoff 4b91c313a9 Combine the CurrentStateDeltaStream into the EventStream 2019-03-27 22:07:05 +00:00
Richard van der Hoff 1f6d6f918a Make EventStream rows have a type
... as a precursor to combining it with the CurrentStateDelta stream.
2019-03-27 22:07:05 +00:00
Richard van der Hoff 015b3622eb Skip building a ROW_TYPE when building updates
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
2019-03-27 21:58:03 +00:00
Richard van der Hoff f570916a3e Add parse_row method to replication stream class
This will allow individual stream classes to override how a row is parsed.
2019-03-27 21:32:33 +00:00
Richard van der Hoff 71dcb275f1 move FederationStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff aa1e017864 move EventsStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff a5798de067 Move replication.tcp.streams into a package 2019-03-27 21:13:14 +00:00
Richard van der Hoff acaa18f7dd
Fix/improve some docstrings in the replication code. (#4949) 2019-03-27 21:12:36 +00:00
Richard van der Hoff 8cbbedaa2b
Fix ClientReplicationStreamProtocol.__str__ (#4929)
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.

We can calculate the peer addr from the transport, so there is no need for addr
anyway.
2019-03-25 16:41:51 +00:00
Richard van der Hoff 9bde730ef8
Fix bug where read-receipts lost their timestamps (#4927)
Make sure that they are sent correctly over the replication stream.

Fixes: #4898
2019-03-25 16:38:05 +00:00
Richard van der Hoff cdb8036161
Add a config option for torture-testing worker replication. (#4902)
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
2019-03-20 16:04:35 +00:00
Erik Johnston face0c5b3c Prefill client IPs cache on workers 2019-03-06 17:39:32 +00:00
Andrew Morgan 7b8a157b79
Merge pull request #4792 from matrix-org/anoa/replication_tokens
Support batch updates in the worker sender
2019-03-06 15:48:29 +00:00
Brendan Abolivier a4c3a361b7
Add rate-limiting on registration (#4735)
* Rate-limiting for registration

* Add unit test for registration rate limiting

* Add config parameters for rate limiting on auth endpoints

* Doc

* Fix doc of rate limiting function

Co-Authored-By: babolivier <contact@brendanabolivier.com>

* Incorporate review

* Fix config parsing

* Fix linting errors

* Set default config for auth rate limiting

* Fix tests

* Add changelog

* Advance reactor instead of mocked clock

* Move parameters to registration specific config and give them more sensible default values

* Remove unused config options

* Don't mock the rate limiter un MAU tests

* Rename _register_with_store into register_with_store

* Make CI happy

* Remove unused import

* Update sample config

* Fix ratelimiting test for py2

* Add non-guest test
2019-03-05 14:25:33 +00:00
Andrew Morgan b9f6163092 Simplify token replication logic 2019-03-05 13:58:30 +00:00
Erik Johnston a84b8d56c2 Fixup slave stores 2019-03-04 18:04:57 +00:00
Andrew Morgan fe7bd23a85 Clean up logic and add comments 2019-03-04 15:08:15 +00:00
Andrew Morgan 9f7cdf3da1 Clearer branching, fix missing list clear 2019-03-04 14:36:52 +00:00
Andrew Morgan 5f0c449dd5 Prevent replication wedging 2019-03-04 14:03:18 +00:00
Erik Johnston 1e315017d3 When presence is enabled don't send over replication 2019-02-27 13:53:46 +00:00
Erik Johnston 7590e9fa28
Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff
Fix tightloop over connecting to replication server
2019-02-27 11:00:59 +00:00
Erik Johnston 6bb1c028f1 Limit cache invalidation replication line length (#4748) 2019-02-27 10:28:37 +00:00
Erik Johnston 6870fc496f Move connecting logic into ClientReplicationStreamProtocol 2019-02-27 10:23:51 +00:00
Erik Johnston 25814921f1 Increase the max delay between retry attempts
Otherwise if you have many workers they can easily take out master with
their connection attempts
2019-02-26 15:12:33 +00:00
Erik Johnston 313987187e Fix tightloop over connecting to replication server
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.

This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.

The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Erik Johnston 80467bbac3 Fix state cache invalidation on workers 2019-02-22 14:38:14 +00:00
Erik Johnston dbdc565dfd Fix registration on workers (#4682)
* Move RegistrationHandler init to HomeServer

* Move post registration actions to RegistrationHandler

* Add post regisration replication endpoint

* Newsfile
2019-02-20 18:47:31 +11:00
Erik Johnston a9b5ea6fc1 Batch cache invalidation over replication
Currently whenever the current state changes in a room invalidate a lot
of caches, which cause *a lot* of traffic over replication. Instead,
lets batch up all those invalidations and send a single poke down
the replication streams.

Hopefully this will reduce load on the master process by substantially
reducing traffic.
2019-02-18 17:53:31 +00:00
Erik Johnston af691e415c Move register_device into handler 2019-02-18 16:49:38 +00:00
Erik Johnston eb2b8523ae Split out registration to worker
This allows registration to be handled by a worker, though the actual
write to the database still happens on master.

Note: due to the in-memory session map all registration requests must be
handled by the same worker.
2019-02-18 12:12:57 +00:00
Erik Johnston a4f52a33fe Fix replication for room v3 (#4523)
* Fix replication for room v3

We were not correctly quoting the path fragments over http replication,
which meant that it exploded when the event IDs had a slash in them

* Newsfile
2019-01-30 14:19:52 +00:00
Erik Johnston b6b73a0bcf Fix receiving events from federation via a worker
This bug was introduced in PR #4470, commit 678a92cb56
2019-01-29 10:30:26 +00:00
Erik Johnston 678a92cb56 Replace missed usages of FrozenEvent 2019-01-25 10:32:30 +00:00
Erik Johnston be6a7e47fa
Revert "Require event format version to parse or create events" 2019-01-25 10:23:51 +00:00
Erik Johnston e8c9f15397 Replace missed usages of FrozenEvent 2019-01-24 11:14:07 +00:00
Erik Johnston a163b748a5 Don't truncate command name in metrics 2018-10-29 17:34:21 +00:00
Amber Brown c4b3698a80
Make the replication logger quieter (#4108) 2018-10-29 22:59:44 +11:00
Amber Brown 381d2cfdf0
Make workers work on Py3 (#4027) 2018-10-13 00:14:08 +11:00
Travis Ralston f1a7264663
Fix minor typo in exception 2018-09-13 11:51:12 -06:00
Amber Brown 7c27c4d51c
merge (#3576) 2018-09-14 03:11:11 +10:00
Erik Johnston 3e242dc149 Remove conn_id 2018-09-04 11:45:52 +01:00
Erik Johnston b13836da7f Remove conn_id from repl prometheus metrics
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
2018-09-03 17:22:49 +01:00
Erik Johnston 2aa7cc6a46
Merge pull request #3713 from matrix-org/erikj/fixup_fed_logging
Fix logging bug in EDU handling over replication
2018-08-20 10:51:45 +01:00
Erik Johnston 3b2dcfff78 Fix logging bug in EDU handling over replication 2018-08-17 11:11:06 +01:00
Richard van der Hoff 0e8d78f6aa Logcontexts for replication command handlers
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.

Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
2018-08-17 00:43:43 +01:00
Erik Johnston 488ffe6fdb Use federation handler function rather than duplicate
This involves renaming _persist_events to be a public function.
2018-08-15 14:17:18 +01:00
Erik Johnston 773db62a22 Rename slave TransactionStore to SlaveTransactionStore 2018-08-15 14:17:06 +01:00
Erik Johnston b179537f2a Move clean_room_for_join to master 2018-08-09 10:37:38 +01:00
Erik Johnston 72d1902bbe Fixup doc comments 2018-08-09 10:23:49 +01:00