synapse-old

Commit Graph

Author	SHA1	Message	Date
Erik Johnston	ba47fea528	Allow multiple workers to write to receipts stream. (#16432 ) Fixes #16417	2023-10-25 16:16:19 +01:00
Jason Little	ffbe9b7666	Remove duplicate call to wake a remote destination when using federation sending worker (#16515 )	2023-10-24 08:09:59 -04:00
Erik Johnston	8f35f8148e	Fix bug where a new writer advances their token too quickly (#16473 ) * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted before it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position	2023-10-23 16:57:30 +01:00
Patrick Cloke	49c9745b45	Avoid sending massive replication updates when purging a room. (#16510 )	2023-10-18 12:26:01 -04:00
Richard van der Hoff	109882230c	Clean up logging on event persister endpoints (#16488 )	2023-10-14 17:57:27 +01:00
Patrick Cloke	ae5b997cfa	Fix comments related to replication. (#16428 )	2023-10-06 07:25:44 -04:00
Patrick Cloke	4e302b30b6	Add __slots__ to replication commands. (#16429 ) To slightly reduce the amount of memory each command takes.	2023-10-05 07:38:55 -04:00
Erik Johnston	80ec81dcc5	Some refactors around receipts stream (#16426 )	2023-10-04 16:28:40 +01:00
Erik Johnston	20fb08ec80	Downgrade repl stream time out error to warning (#16401 ) This is because if a worker reaches ~100% CPU then everything starts lagging and we hit the log line a lot. When at error we invoke sentry and that has a lot of overhead, which then puts even more pressure on the worker.	2023-09-29 11:52:48 +00:00
Patrick Cloke	f84da3c32e	Add a cache around server ACL checking (#16360 ) * Pre-compiles the server ACLs onto an object per room and invalidates them when new events come in. * Converts the server ACL checking into Rust.	2023-09-26 11:57:50 -04:00
Erik Johnston	329597022e	Some minor performance fixes for task schedular (#16313 )	2023-09-14 16:20:47 +01:00
Erik Johnston	ab13fb08bf	Improve logging of replication (#16309 )	2023-09-13 09:51:50 +00:00
Erik Johnston	1cd410a783	Recheck if remote device is cached before requesting it (#16252 ) This fixes a bug where we could get stuck re-requesting the device over replication again and again.	2023-09-07 12:45:43 +00:00
Patrick Cloke	55c20da4a3	Merge remote-tracking branch 'origin/release-v1.91' into release-v1.92	2023-09-06 11:25:28 -04:00
Quentin Gliech	1940d990a3	Revert MSC3861 introspection cache, admin impersonation and account lock (#16258 )	2023-09-06 15:19:51 +01:00
Erik Johnston	d35bed8369	Don't wake up destination transaction queue if they're not due for retry. (#16223 )	2023-09-04 17:14:09 +01:00
David Robertson	e9eb26e3af	Cache device resync requests over replication (#16241 )	2023-09-04 11:57:59 +01:00
Patrick Cloke	e9235d92f2	Track currently syncing users by device for presence (#16172 ) Refactoring to use both the user ID & the device ID when tracking the currently syncing users in the presence handler. This is done both locally and over replication. Note that the device ID is discarded but will be used in a future change.	2023-08-29 11:44:07 -04:00
Patrick Cloke	40901af5e0	Pass the device ID around in the presence handler (#16171 ) Refactoring to pass the device ID (in addition to the user ID) through the presence handler (specifically the `user_syncing`, `set_state`, and `bump_presence_active_time` methods and their replication versions).	2023-08-28 13:08:49 -04:00
Patrick Cloke	1bf143699c	Combine logic about not overriding BUSY presence. (#16170 ) Simplify some of the presence code by reducing duplicated code between worker & non-worker modes. The main change is to push some of the logic from `user_syncing` into `set_state`. This is done by passing whether the user is setting the presence via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some additional logic is performed: * Don't override `busy` presence. * Update the `last_user_sync_ts`. * Never update the status message.	2023-08-28 11:03:23 -04:00
Mathieu Velten	501da8ecd8	Task scheduler: add replication notify for new task to launch ASAP (#16184 )	2023-08-28 14:03:51 +00:00
Erik Johnston	803f63df1c	Fix perf of `wait_for_stream_positions` (#16148 )	2023-08-22 15:11:22 +00:00
Shay	69048f7b48	Add an admin endpoint to allow authorizing server to signal token revocations (#16125 )	2023-08-22 14:15:34 +00:00
Patrick Cloke	ad3f43be9a	Run pyupgrade for python 3.7 & 3.8. (#16110 )	2023-08-15 08:11:20 -04:00
Erik Johnston	ae55cc1e6b	Add ability to wait for locks and add locks to purge history / room deletion (#15791 ) c.f. #13476	2023-07-31 10:58:03 +01:00
Shay	68b2611783	Clarify comment on key uploads over replication (#16016 )	2023-07-27 15:08:46 -07:00
Jason Little	c835befd10	Add Unix socket support for Redis connections (#15644 ) Adds a new configuration setting to connect to Redis via a Unix socket instead of over TCP. Disabled by default.	2023-05-26 15:28:39 -04:00
Jason Little	1df0221bda	Use a custom scheme & the worker name for replication requests. (#15578 ) All the information needed is already in the `instance_map`, so use that instead of passing the hostname / IP & port manually for each replication request. This consolidates logic for future improvements of using e.g. UNIX sockets for workers.	2023-05-23 09:05:30 -04:00
Patrick Cloke	375b0a8a11	Update code to refer to "workers". (#15606 ) A bunch of comments and variables are out of date and use obsolete terms.	2023-05-16 15:56:38 -04:00
Roel ter Maat	2611433b70	Add redis SSL configuration options (#15312 ) * Add SSL options to redis config * fix lint issues * Add documentation and changelog file * add missing . at the end of the changelog * Move client context factory to new file * Rename ssl to tls and fix typo * fix lint issues * Added when redis attributes were added	2023-05-11 13:02:51 +01:00
Jason Little	e4f545c452	Remove `worker_replication_` settings (#15491 ) Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master. * Fix typo in drive by. * Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place) * Several updates: 1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future. 2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map. 3. Clean up old comments/commented out code. 4. Adjust unit tests to match with new code. 5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template. * Initial Docs upload * Changelog * Missed some commented out code that can go now * Remove TODO comment that no longer holds true. * Fix links in docs * More docs * Remove debug logging * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Update version to latest, include completeish before/after examples in upgrade notes. * Fix up and docs too --------- Co-authored-by: reivilibre <olivier@librepush.net>	2023-05-11 11:30:56 +01:00
Jason Little	d3bd03559b	HTTP Replication Client (#15470 ) Separate out a HTTP client for replication in preparation for also supporting using UNIX sockets. The major difference from the base class is that this does not use treq to handle HTTP requests.	2023-05-09 14:25:20 -04:00
Alok Kumar Singh	197fbb123b	Remove legacy code of single user device resync api (#15418 ) * Removed single-user resync usage and updated it to use multi-user counterpart Signed-off-by: Alok Kumar Singh alokaks601@gmail.com	2023-04-21 12:06:39 +01:00
Mathieu Velten	9228ae633f	Add some clarification to the doc/comments regarding TCP replication (#15354 )	2023-03-30 12:51:35 +02:00
David Robertson	1bc9985eb7	Have replication clients remove _INT_STREAM_POS (#15309 ) * Have replication clients remove _INT_STREAM_POS Suppose worker A makes an internal http request from worker B. B may make changes that A later learns about over replication. We want A's request to block until it has seen those changes—mainly to ensure A's caches are invalidated promptly. This helps provide read-after-write consistency, eliminating entire categories of races and test flakes. To implement this, B includes a top-level field `_INT_STREAM_POS` in its response JSON. Roughly speaking, the field's value tells A what to wait for. But we weren't removing that internal field before A's request completed! Introduced in https://github.com/matrix-org/synapse/pull/14820. Fixes #15308. * Changelog	2023-03-22 12:53:55 +00:00
Patrick Cloke	afb216c202	Remove no-op send_command for Redis replication. (#15274 ) With Redis commands do not need to be re-issued by the main process (they fan-out to all processes at once) and thus it is no longer necessary to worry about them reflecting recursively forever.	2023-03-16 11:13:30 -04:00
Patrick Cloke	3bf973edc7	Remove unused class: DirectTcpReplicationClientFactory. (#15272 )	2023-03-15 15:42:20 -04:00
Dirk Klimpel	ecbe0ddbe7	Add support for knocking to workers. (#15133 )	2023-03-02 12:59:53 -05:00
H. Shay	b2fd03d075	Merge branch 'master' into develop	2023-02-28 10:14:20 -08:00
Erik Johnston	b2357a898c	Fix bug where 5s delays would occasionally happen. (#15150 ) This only affects deployments using workers.	2023-02-24 14:39:50 +00:00
dependabot[bot]	9bb2eac719	Bump black from 22.12.0 to 23.1.0 (#15103 )	2023-02-22 15:29:09 -05:00
reivilibre	addd12f16d	Tweak logging for when a worker waits for its view of a replication stream to catch up. (#15120 )Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Improve logging messages for the 'wait for repl stream' read-after-write consistency feature * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Update synapse/replication/tcp/client.py Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-02-21 12:26:00 +00:00
Erik Johnston	c78c67c5a9	Fix bug in replication where response is cached (#15024 )	2023-02-08 16:41:55 +00:00
David Robertson	80d44060c9	Faster joins: omit partial rooms from eager syncs until the resync completes (#14870 ) * Allow `AbstractSet` in `StrCollection` Or else frozensets are excluded. This will be useful in an upcoming commit where I plan to change a function that accepts `List[str]` to accept `StrCollection` instead. * `rooms_to_exclude` -> `rooms_to_exclude_globally` I am about to make use of this exclusion mechanism to exclude rooms for a specific user and a specific sync. This rename helps to clarify the distinction between the global config and the rooms to exclude for a specific sync. * Better function names for internal sync methods * Track a list of excluded rooms on SyncResultBuilder I plan to feed a list of partially stated rooms for this sync to ignore * Exclude partial state rooms during eager sync using the mechanism established in the previous commit * Track un-partial-state stream in sync tokens So that we can work out which rooms have become fully-stated during a given sync period. * Fix mutation of `@cached` return value This was fouling up a complement test added alongside this PR. Excluding a room would mean the set of forgotten rooms in the cache would be extended. This means that room could be erroneously considered forgotten in the future. Introduced in #12310, Synapse 1.57.0. I don't think this had any user-visible side effects (until now). * SyncResultBuilder: track rooms to force as newly joined Similar plan as before. We've omitted rooms from certain sync responses; now we establish the mechanism to reintroduce them into future syncs. * Read new field, to present rooms as newly joined * Force un-partial-stated rooms to be newly-joined for eager incremental syncs only, provided they're still fully stated * Notify user stream listeners to wake up long polling syncs * Changelog * Typo fix Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Unnecessary list cast Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Rephrase comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Another comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Fixup merge(?) * Poke notifier when receiving un-partial-stated msg over replication * Fixup merge whoops Thanks MV :) Co-authored-by: Mathieu Velen <mathieuv@matrix.org> Co-authored-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>	2023-01-23 15:44:39 +00:00
Sean Quah	2ec9c58496	Faster joins: Update room stats and the user directory on workers when finishing join (#14874 ) * Faster joins: Update room stats and user directory on workers when done When finishing a partial state join to a room, we update the current state of the room without persisting additional events. Workers receive notice of the current state update over replication, but neglect to wake the room stats and user directory updaters, which then get incidentally triggered the next time an event is persisted or an unrelated event persister sends out a stream position update. We wake the room stats and user directory updaters at the appropriate time in this commit. Part of #12814 and #12815. Signed-off-by: Sean Quah <seanq@matrix.org> * fixup comment Signed-off-by: Sean Quah <seanq@matrix.org>	2023-01-23 10:31:36 +00:00
reivilibre	22cc93afe3	Enable Faster Remote Room Joins against worker-mode Synapse. (#14752 ) * Enable Complement tests for Faster Remote Room Joins on worker-mode * (dangerous) Add an override to allow Complement to use FRRJ under workers * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Fix race where we didn't send out replication notification * MORE HACKS * Fix get_un_partial_stated_rooms_token to take instance_name * Fix bad merge * Remove warning * Correctly advance un_partial_stated_room_stream * Fix merge * Add another notify_replication * Fixups * Create a separate ReplicationNotifier * Fix test * Fix portdb * Create a separate ReplicationNotifier * Fix test * Fix portdb * Fix presence test * Newsfile * Apply suggestions from code review * Update changelog.d/14752.misc Co-authored-by: Erik Johnston <erik@matrix.org> * lint Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Erik Johnston <erik@matrix.org>	2023-01-22 21:10:11 +00:00
Erik Johnston	0ec12a3753	Reduce max time we wait for stream positions (#14881 ) Now that we wait for stream positions whenever we do a HTTP replication hit, we need to be less brutal in the case where we do timeout (as we have bugs around this).	2023-01-20 21:04:33 +00:00
Erik Johnston	cdf2707678	Fix bug in wait for stream position (#14872 ) This caused some requests to fail. This caused some requests to fail. This really only started causing issues due to #14856	2023-01-19 22:19:56 +00:00
Erik Johnston	9187fd940e	Wait for streams to catch up when processing HTTP replication. (#14820 ) This should hopefully mitigate a class of races where data gets out of sync due a HTTP replication request racing with the replication streams.	2023-01-18 19:35:29 +00:00
Erik Johnston	316590d1ea	Fix bug in `wait_for_stream_position` (#14856 ) We were incorrectly checking if the local token had been advanced, rather than the token for the remote instance. In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).	2023-01-17 09:58:22 +00:00

1 2 3 4 5 ...

695 Commits