synapse-old

Commit Graph

Author	SHA1	Message	Date
Shay	b7a7ff6ee3	Add initial power level event to batch of bulk persisted events when creating a new room. (#14228 )	2022-10-21 10:46:22 -07:00
Andrew Morgan	dc02d9f8c5	Avoid checking the event cache when backfilling events (#14164 )	2022-10-18 10:33:35 +01:00
Eric Eastwood	40bb37eb27	Stop getting missing `prev_events` after we already know their signature is invalid (#13816 ) While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place. Related to - https://github.com/matrix-org/synapse/issues/13622 - https://github.com/matrix-org/synapse/pull/13635 - https://github.com/matrix-org/synapse/issues/13676 Part of https://github.com/matrix-org/synapse/issues/13356 Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures. With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now. For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761 To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid. - `backfill` - `outgoing-federation-request` `/backfill` - `_check_sigs_and_hash_and_fetch` - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - `_process_pulled_events` - `_process_pulled_event` for each validated event - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it - `_get_state_ids_after_missing_prev_event` - `outgoing-federation-request` `/state_ids` - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check	2022-10-15 00:36:49 -05:00
Shay	b6baa46db0	Fix a bug where the joined hosts for a given event were not being properly cached (#14125 )	2022-10-12 11:01:00 -07:00
Shay	7b7478e8b6	Batch up notifications after event persistence (#14033 )	2022-10-05 10:12:48 -07:00
Eric Eastwood	2769ef4df1	Revert the general exception recording introduced in #13814 (#13969 ) * Maybe not catch all errors to avoid things in the nature-of CancelledError See https://github.com/matrix-org/synapse/pull/13815#discussion_r983384698 * Remove general exception tracking * Add changelog	2022-10-03 10:14:45 +01:00
Kateřina Churanová	6caa303083	fix: Push notifications for invite over federation (#13719 )	2022-09-28 12:31:53 +00:00
reivilibre	c06b2b7142	Faster Remote Room Joins: tell remote homeservers that we are unable to authorise them if they query a room which has partial state on our server. (#13823 )	2022-09-23 11:47:16 +01:00
Eric Eastwood	140af0cdb6	Record any exception when processing a pulled event (#13814 ) Part of https://github.com/matrix-org/synapse/issues/13700 and https://github.com/matrix-org/synapse/issues/13356 Follow-up to https://github.com/matrix-org/synapse/pull/13589	2022-09-15 14:40:49 -05:00
Eric Eastwood	957e3d74fc	Keep track when we try and fail to process a pulled event (#13589 ) We can follow-up this PR with: 1. Only try to backfill from an event if we haven't tried recently -> https://github.com/matrix-org/synapse/issues/13622 1. When we decide to backfill that event again, process it in the background so it doesn't block and make `/messages` slow when we know it will probably fail again -> https://github.com/matrix-org/synapse/issues/13623 1. Generally track failures everywhere we try and fail to pull an event over federation -> https://github.com/matrix-org/synapse/issues/13700 Fix https://github.com/matrix-org/synapse/issues/13621 Part of https://github.com/matrix-org/synapse/issues/13356 Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.qv7cj51sv9i5)	2022-09-14 13:57:50 -05:00
Eric Eastwood	0bf180cbb4	Comment about a better future where we can get the state diff between two events (#13586 ) Split off from https://github.com/matrix-org/synapse/pull/13561 Part of https://github.com/matrix-org/synapse/issues/13356 Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)	2022-08-24 18:59:27 -05:00
Eric Eastwood	9385c41ba4	Fix Prometheus metrics being negative (mixed up start/end) (#13584 ) Fix: - https://github.com/matrix-org/synapse/pull/13535#discussion_r949582508 - https://github.com/matrix-org/synapse/pull/13533#discussion_r949577244	2022-08-23 08:47:30 +01:00
Eric Eastwood	06df5d4250	MSC2716v4 room version - remove namespace from MSC2716 event content fields (#13551 ) Complement PR: https://github.com/matrix-org/complement/pull/450 As suggested in https://github.com/matrix-org/matrix-spec-proposals/pull/2716#discussion_r941444525	2022-08-19 15:37:01 -05:00
Eric Eastwood	088bcb7ecb	Time how long it takes us to do backfill processing (#13535 )	2022-08-17 10:33:19 +01:00
Eric Eastwood	0a4efbc1dd	Instrument the federation/backfill part of `/messages` (#13489 ) Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356	2022-08-16 12:39:40 -05:00
Eric Eastwood	92d21faf12	Instrument `/messages` for understandable traces in Jaeger (#13368 ) In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work	2022-08-03 10:57:38 -05:00
Patrick Cloke	f8e7a9418a	Fix missing import in `federation_event` handler. (#13431 ) #13404 removed an import of `Optional` which was still needed due to #13413 added more usages.	2022-08-01 14:14:29 +00:00
Sean Quah	224d792dd7	Refactor `_resolve_state_at_missing_prevs` to return an `EventContext` (#13404 ) Previously, `_resolve_state_at_missing_prevs` returned the resolved state before an event and a partial state flag. These were unwieldy to carry around would only ever be used to build an event context. Build the event context directly instead. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-01 13:53:56 +01:00
Richard van der Hoff	23768ccb4d	Faster joins: fix rejected events becoming un-rejected during resync (#13413 ) Make sure that we re-check the auth rules during state resync, otherwise rejected events get un-rejected.	2022-08-01 11:20:05 +01:00
Richard van der Hoff	ca3db044a3	Fix infinite loop in partial-state resync (#13353 ) Make sure that we only pull out events from the db once they have no prev-events with partial state.	2022-07-26 11:47:31 +00:00
Sean Quah	335ebb21cc	Faster room joins: avoid blocking when pulling events with missing prevs (#13355 ) Avoid blocking on full state in `_resolve_state_at_missing_prevs` and return a new flag indicating whether the resolved state is partial. Thread that flag around so that it makes it into the event context. Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2022-07-26 12:39:23 +01:00
Eric Eastwood	357561c1a2	Backfill remote event fetched by MSC3030 so we can paginate from it later (#13205 ) Depends on https://github.com/matrix-org/synapse/pull/13320 Complement tests: https://github.com/matrix-org/complement/pull/406 We could use the same method to backfill for `/context` as well in the future, see https://github.com/matrix-org/synapse/issues/3848	2022-07-22 16:00:11 -05:00
Sean Quah	158782c3ce	Skip soft fail checks for rooms with partial state (#13354 ) When a room has the partial state flag, we may not have an accurate `m.room.member` event for event senders in the room's current state, and so cannot perform soft fail checks correctly. Skip the soft fail check entirely in this case. As an alternative, we could block until we have full state, but that would prevent us from receiving incoming events over federation, which is undesirable. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-22 10:13:01 +01:00
Eric Eastwood	0f971ca68e	Update `get_pdu` to return the original, pristine `EventBase` (#13320 ) Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier` and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call. Split out from https://github.com/matrix-org/synapse/pull/13205 As discussed at: - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125 Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](`7864f33e28/synapse/federation/federation_client.py (L581-L594)`).	2022-07-20 15:58:51 -05:00
Sean Quah	172ce29b14	Fix spurious warning when fetching state after a missing prev event (#13258 )	2022-07-19 19:15:54 +01:00
David Robertson	b977867358	Rate limit joins per-room (#13276 )	2022-07-19 11:45:17 +00:00
Richard van der Hoff	fe15a865a5	Rip out auth-event reconciliation code (#12943 ) There is a corner in `_check_event_auth` (long known as "the weird corner") where, if we get an event with auth_events which don't match those we were expecting, we attempt to resolve the diffence between our state and the remote's with a state resolution. This isn't specced, and there's general agreement we shouldn't be doing it. However, it turns out that the faster-joins code was relying on it, so we need to introduce something similar (but rather simpler) for that.	2022-07-14 21:52:26 +00:00
Sean Quah	68db233f0c	Handle race between persisting an event and un-partial stating a room (#13100 ) Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in a lot of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-05 16:12:52 +01:00
Richard van der Hoff	6da861ae69	`_process_received_pdu`: Improve exception handling (#13145 ) `_check_event_auth` is expected to raise `AuthError`s, so no need to log it again.	2022-07-01 10:52:10 +01:00
Sean Quah	9372f6f842	Fix logging context misuse when we fail to persist a federation event (#13089 ) When we fail to persist a federation event, we kick off a task to remove its push actions in the background, using the current logging context. Since we don't `await` that task, we may finish our logging context before the task finishes. There's no reason to not `await` the task, so let's do that. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-06-17 10:22:50 +01:00
Richard van der Hoff	8ecf6be1e1	Move some event auth checks out to a different method (#13065 ) * Add auth events to events used in tests * Move some event auth checks out to a different method Some of the event auth checks apply to an event's auth_events, rather than the state at the event - which means they can play no part in state resolution. Move them out to a separate method. * Rename check_auth_rules_for_event Now it only checks the state-dependent auth rules, it needs a better name.	2022-06-15 19:48:22 +01:00
Richard van der Hoff	f68b5e5773	Merge branch 'rav/simplify_event_auth_interface' into develop	2022-06-13 11:34:59 +01:00
Richard van der Hoff	0d9d36b15c	Remove `room_version` param from `check_auth_rules_for_event` Instead, use the `room_version` property of the event we're checking. The `room_version` was originally added as a parameter somewhere around #4482, but really it's been redundant since #6875 added a `room_version` field to `EventBase`.	2022-06-12 23:13:10 +01:00
Richard van der Hoff	68be42f6b6	Remove `room_version` param from `validate_event_for_room_version` Instead, use the `room_version` property of the event we're validating. The `room_version` was originally added as a parameter somewhere around #4482, but really it's been redundant since #6875 added a `room_version` field to `EventBase`.	2022-06-12 23:13:09 +01:00
Richard van der Hoff	7c6b2204d1	Faster joins: add issue links to the TODOs (#13004 ) ... to help us keep track of these things	2022-06-09 10:13:03 +00:00
Erik Johnston	e3163e2e11	Reduce the amount of state we pull from the DB (#12811 )	2022-06-06 09:24:12 +01:00
Sean Quah	2fba1076c5	Faster room joins: Try other destinations when resyncing the state of a partial-state room (#12812 ) Signed-off-by: Sean Quah <seanq@matrix.org>	2022-05-31 15:50:29 +01:00
Erik Johnston	1e453053cb	Rename storage classes (#12913 )	2022-05-31 12:17:50 +00:00
Erik Johnston	b83bc5fab5	Pull out less state when handling gaps mk2 (#12852 )	2022-05-26 09:48:12 +00:00
Erik Johnston	4660d9fdcf	Fix up `state_store` naming (#12871 )	2022-05-25 12:59:04 +01:00
Eric Eastwood	7c2a78bb3b	Marker events as state - MSC2716 (#12718 ) Sending marker events as state now so they are always able to be seen by homeservers (not lost in some timeline gap). Part of [MSC2716](https://github.com/matrix-org/matrix-spec-proposals/pull/2716) Complement tests: https://github.com/matrix-org/complement/pull/371 As initially discussed at https://github.com/matrix-org/matrix-spec-proposals/pull/2716#discussion_r782629097 and https://github.com/matrix-org/matrix-spec-proposals/pull/2716#discussion_r876684431 When someone joins a room, process all of the marker events we see in the current state. Marker events should be sent with a unique `state_key` so that they can all resolve in the current state to easily be discovered. Marker events as state - If we re-use the same `state_key` (like `""`), then we would have to fetch previous snapshots of state up through time to find all of the marker events. This way we can avoid all of that. This PR was originally doing this but then thought of the smarter way to tackle in an [out of band discussion with @erikjohnston](https://docs.google.com/document/d/1JJDuPfcPNX75fprdTWlxlaKjWOdbdJylbpZ03hzo638/edit#bookmark=id.sm92fqyq7vpp). - Also avoids state resolution conflicts where only one of the marker events win As a homeserver, when we see new marker state, we know there is new history imported somewhere back in time and should process it to fetch the insertion event where the historical messages are and set it as an insertion extremity. This way we know where to backfill more messages when someone asks for scrollback.	2022-05-23 20:43:37 -05:00
Shay	71e8afe34d	Update EventContext `get_current_event_ids` and `get_prev_event_ids` to accept state filters and update calls where possible (#12791 )	2022-05-20 09:54:12 +01:00
Patrick Cloke	a4c75918b3	Remove unneeded `ActionGenerator` class. (#12691 ) It simply passes through to `BulkPushRuleEvaluator`, which can be called directly instead.	2022-05-11 07:15:21 -04:00
Erik Johnston	c72d26c1e1	Refactor `EventContext` (#12689 ) Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing. The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching). One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used. Part of #12684	2022-05-10 19:43:13 +00:00
andrew do	01e625513a	remove constantly lib use and switch to enums. (#12624 )	2022-05-04 11:26:11 +00:00
Richard van der Hoff	f5668f0b4a	Await un-partial-stating after a partial-state join (#12399 ) When we join a room via the faster-joins mechanism, we end up with "partial state" at some points on the event DAG. Many parts of the codebase need to wait for the full state to load. So, we implement a mechanism to keep track of which events have partial state, and wait for them to be fully-populated.	2022-04-21 07:42:03 +01:00
Richard van der Hoff	320186319a	Resync state after partial-state join (#12394 ) We work through all the events with partial state, updating the state at each of them. Once it's done, we recalculate the state for the whole room, and then mark the room as having complete state.	2022-04-12 13:23:43 +00:00
Sean Quah	800ba87cc8	Refactor and convert `Linearizer` to async (#12357 ) Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>	2022-04-05 15:43:52 +01:00
Richard van der Hoff	9b43df1f7b	Optimise `_get_state_after_missing_prev_event`: use `/state` (#12040 ) If we're missing most of the events in the room state, then we may as well call the /state endpoint, instead of individually requesting each and every event.	2022-04-01 12:53:42 +01:00
Richard van der Hoff	9b67715bc3	Disable proactive sends for remote joins (#12330 ) Do not attempt to send remote joins out over federation. Normally, it will do nothing; occasionally, it will do the wrong thing.	2022-03-30 12:04:35 +01:00

1 2

94 Commits