Clean-up from adding the thread_id column, which was initially
null but backfilled with values. It is desirable to require it to now
be non-null.
In addition to altering this column to be non-null, we clean up
obsolete background jobs, indexes, and just-in-time updating
code.
This makes it so that we rely on the `device_id` to delete pushers on logout,
instead of relying on the `access_token_id`. This ensures we're not removing
pushers on token refresh, and prepares for a world without access token IDs
(also known as the OIDC).
This actually runs the `set_device_id_for_pushers` background update, which
was forgotten in #13831.
Note that for backwards compatibility it still deletes pushers based on the
`access_token` until the background update finishes.
* Add `event_stream_ordering` column to membership state tables
Specifically this adds the column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used.
* Make denormalised `event_stream_ordering` columns foreign keys
* Add comment in schema file explaining new denormalised columns
* Add triggers to enforce consistency of `event_stream_ordering` columns
* Re-order purge room tables to account for foreign keys
* Bump schema version to 75
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Remove special-case method for new memberships only, use more generic method
* Only collect profiles from state events in public rooms
* Add a table to track stale remote user profiles
* Add store methods to set and delete rows in this new table
* Mark remote profiles as stale when a member state event comes in to a private room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Simplify by removing Optionality of `event_id`
* Replace names and avatars with None if they're set to dodgy things
I think this makes more sense anyway.
* Move schema delta to 74 (I missed the boat?)
* Turns out these can be None after all
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
This adds an `event_stream_ordering` column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used. Includes a background job to
populate these values from the `events` table.
Same idea as https://github.com/matrix-org/synapse/pull/13703.
Signed off by Nick @ Beeper (@fizzadar).
if a Synapse deployment upgraded (from < 1.62.0 to >= 1.70.0) then it
is possible for schema deltas to run before background updates causing
drift in the database schema due to:
1. A delta registered a background update to create an index.
2. A delta dropped the above index if it exists (but it yet exist won't since
the background job hasn't run).
3. The code assumed the index was dropped.
To fix this we:
1. Cancel the background update which could create the index.
2. Drop the index again.
3. Drop a related index which is dropped by the background update.
Due to the various fixes to the StreamChangeCache it is not
safe to trust the information in the user directory or room/user
stats tables. Rebuild them as background jobs.
In particular see da77720752 (#14639),
and 6a8310f3df (#14435).
Maybe also be related to fac8a38525
(#14592).
When a local device list change is added to
`device_lists_changes_in_room`, the `converted_to_destinations` flag is
set to `FALSE` and the `_handle_new_device_update_async` background
process is started. This background process looks for unconverted rows
in `device_lists_changes_in_room`, copies them to
`device_lists_outbound_pokes` and updates the flag.
To update the `converted_to_destinations` flag, the database performs a
`DELETE` and `INSERT` internally, which fragments the table. To avoid
this, track unconverted rows using a `(stream ID, room ID)` position
instead of the flag.
From now on, the `converted_to_destinations` column indicates rows that
need converting to outbound pokes, but does not indicate whether the
conversion has already taken place.
Closes#14037.
Signed-off-by: Sean Quah <seanq@matrix.org>
PostgreSQL may underestimate the number of distinct `room_id`s in
`event_search`, which can cause it to use table scans for queries for
multiple rooms.
Fix this by setting `n_distinct` on the column.
Resolves#14402.
Signed-off-by: Sean Quah <seanq@matrix.org>
Support a unified search query syntax which leverages more of the full-text
search of each database supported by Synapse.
Supports, with the same syntax across Postgresql 11+ and Sqlite:
- quoted "search terms"
- `AND`, `OR`, `-` (negation) operators
- Matching words based on their stem, e.g. searches for "dog" matches
documents containing "dogs".
This is achieved by
- If on postgresql 11+, pass the user input to `websearch_to_tsquery`
- If on sqlite, manually parse the query and transform it into the sqlite-specific
query syntax.
Note that postgresql 10, which is close to end-of-life, falls back to using
`phraseto_tsquery`, which only supports a subset of the features.
Multiple terms separated by a space are implicitly ANDed.
Note that:
1. There is no escaping of full-text syntax that might be supported by the database;
e.g. `NOT`, `NEAR`, `*` in sqlite. This runs the risk that people might discover this
as accidental functionality and depend on something we don't guarantee.
2. English text is assumed for stemming. To support other languages, either the target
language needs to be known at the time of indexing the message (via room metadata,
or otherwise), or a separate index for each language supported could be created.
Sqlite docs: https://www.sqlite.org/fts3.html#full_text_index_queries
Postgres docs: https://www.postgresql.org/docs/11/textsearch-controls.html
Implement the /threads endpoint from MSC3856.
This is currently unstable and behind an experimental configuration
flag.
It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.
The summarization code is also updated to be per thread, instead of per
room.
c.f. #12993 (comment), point 3
This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state.
Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.
Adds a `thread_id` column to the `event_push_actions`, `event_push_actions_staging`,
and `event_push_summary` tables. This will notifications to be segmented by the thread
in a future pull request. The `thread_id` column stores the root event ID or the special
value `"main"`.
The `thread_id` column for `event_push_actions` and `event_push_summary` is
backfilled with `"main"` for all existing rows. New entries into `event_push_actions`
and `event_push_actions_staging` will get the proper thread ID.
`receipts_linearized` and `receipts_graph` also gain a `thread_id` column, which is similar,
except `NULL` is a special value meaning the receipt is "unthreaded".
See MSC3771 and MSC3773 for where this data will be useful.
Partial indices have been supported since SQLite 3.8, but Synapse
now requires >= 3.27, so we can enable support for them.
This requires rebuilding previous indices which were partial on
PostgreSQL, but not on SQLite.
* Remove incorrect migration file from `state` logical DB
The table `ex_outlier_stream` is part of the `main` logical DB; it
should not have been created in the `state` logical DB. We remove this
migration now as a tidy-up.
Note: we cannot `DROP TABLE IF EXISTS ex_outlier_stream` in a new
migration, because some (most) instances of Synapse host both of these
logical DBs on the same DB cluster.
* Changelog
* Remove checks for membership column in current_state_events
* Add schema script to force through the
`current_state_events_membership` background job
Contributed by Nick @ Beeper (@fizzadar).
These columns were added back in Synapse 1.52, and have been populated for new
events since then. It's now (beyond) time to back-populate them for existing
events.
Fixes#11887 hopefully.
The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.
This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.
In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
* Remove redundant references to `event_edges.room_id`
We don't need to care about the room_id here, because we are already checking
the event id.
* Clean up the event_edges table
We make a number of changes to `event_edges`:
* We give the `room_id` and `is_state` columns defaults (null and false
respectively) so that we can stop populating them.
* We drop any rows that have `is_state` set true - they should no longer
exist.
* We drop any rows that do not exist in `events` - these should not exist
either.
* We drop the old unique constraint on all the colums, which wasn't much use.
* We create a new unique index on `(event_id, prev_event_id)`.
* We add a foreign key constraint to `events`.
These happen rather differently depending on whether we are on Postgres or
SQLite. For SQLite, we just rebuild the whole table, copying only the rows we
want to keep. For Postgres, we try to do things in the background as much as
possible.
* Stop populating `event_edges.room_id` and `is_state`
We can just rely on the defaults.
This is a first step in dealing with #7721.
The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:
1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.
However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.
There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).
Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
Switching to a sequence means there's no need to track `last_txn` on the
AS state table to generate new TXN IDs. This also means that there is
no longer contention between the AS scheduler and AS handler on updates
to the `application_services_state` table, which will prevent serialization
errors during the complete AS txn transaction.
When we get a partial_state response from send_join, store information in the
database about it:
* store a record about the room as a whole having partial state, and stash the
list of member servers too.
* flag the join event itself as having partial state
* also, for any new events whose prev-events are partial-stated, note that
they will *also* be partial-stated.
We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.
Signed-off-by: Sean Quah <seanq@element.io>
Instead of only known relation types. This also reworks the background
update for thread relations to crawl events and search for any relation
type, not just threaded relations.
==============================
Bugfixes
--------
- Fix a bug introduced in 1.47.0rc1 which caused worker processes to not halt startup in the presence of outstanding database migrations. ([\#11346](https://github.com/matrix-org/synapse/issues/11346))
- Fix a bug introduced in 1.47.0rc1 which prevented the 'remove deleted devices from `device_inbox` column' background process from running when updating from a recent Synapse version. ([\#11303](https://github.com/matrix-org/synapse/issues/11303), [\#11353](https://github.com/matrix-org/synapse/issues/11353))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmGTw7wTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9FUdD/9EgimhmOUuwcbNQS+zJJo6qhbnxngA
uOIWGhuoBq9vyZjx3s1mwRO/sCuBc6KRs2JYzhntfHwV6FCgkC0QJXPkrYO7mkbg
L/QsET3QYrWwClMN1y/j2/rstOqGtgrtg6pPgd9LiQH+LjdM+vzAyrgxJRyhk9Sx
7184Y9QpEt/6ApqFIbXZewP+zH1z1QyVeZ1WiuHsqhjcrrcpv1t8Rq0fv4FxgIVP
Tmy3T2uw0FeVoQIL8BnNVmRGbLjlSgpDfiXZTwuXGS/uLtTe1tnHf0QsKEK7XZxY
yUsRKQVDy76GW9AJuWKFXa9Voy5gkjLWnvMwHvr+B1WcJi/5cjz9lP2A0Bsm6vhQ
ivrktlhlw7O3x0EvrL1r4z2gRY4GXxyZKOBLRsY4AAIMblNR/e94SmK0rLoKIQn8
Rp50VV2B4cf3WLxoqHBpJh7CDFisth1nXMnETU3y0VgAYMg2+cxqVDawTSTH9v7Q
QeGI10QCMmeFwGoL6lyVJ+qhyUqFeFOqTrNqzigsiB1qule+w1fEEl7cvI+Om/QM
Wjkvw6fpOvUGBN1OOARL5Qnm4rjxa0Ld2vw5vugkGiDTV8P3TMJP1JFQ44IX2GKP
yGJH92ac0qy6vQ2za1gL/w/U3miNblHUIvAmpkaIIlvovQJH9dU5dMsxnvi+QZU+
Auyqh9rwiWiizA==
=iNww
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmGT0mcACgkQkD7OEIo5
3t0sSA/+KjLlnYe3xO9Pbeeq7/E6Qs3VTMbwERcVVj0WZQU8Hik/EH+p0zsnG9jU
8t9UVs0xnu1r5ZMNsrMnfOU5SR9o7RM8UBCHnIYde8yFf1N6FI0YgGJ8nOP96gSG
dKZQOCKA9psQqd0pvhwCem4ITmBvVz2CrnUZYkKp9rHjUZeBYwE7cWmHL5W/WzT+
Sv0zafhvYi76PxMKdXu08MArMZ9JpCLbZnlzaoW/sY+xvnQBRfQviqYkj+qZhQjr
tPnJlQN5FS1pO1ZHd9o0mVcdbDBwLqeQlqC1toFuZXbed6e558KtCzvizM5VAfF8
peRGVFarbZpD/QcvoFljoH9qIECrcAYIN1HeE8aX7pIedh2AVROyINumdnbmsMJH
F4OyX/aLb3KbbecHJaVLQ2c/KyDnSnxr6Fs3HlEslD7DRdA0TS7XsI/BVER0pscp
j2Q5nLOthpV2d1SVekj4Ge/Hr++AuTnriHldEx9OlGI3/74ldrNL2c/L5gM3ue1W
qnPQfr5ehtIitqa/ROiexxSoWS5OC953UujTCwBgil9mAD9gC7mhoEFOAFVuMHfN
zBjsunMGBRYsGcw2umTPGPd7D3Gi1FXQrduN1xUoV8g8vmJld4GDjqJsitiaR3lv
NUsQ5JzttakDUwAJU7qijOo2Y/HtSs5E2nF66bMCSmwHvl9AkUc=
=c7gi
-----END PGP SIGNATURE-----
Merge tag 'v1.47.0rc3' into develop
Synapse 1.47.0rc3 (2021-11-16)
==============================
Bugfixes
--------
- Fix a bug introduced in 1.47.0rc1 which caused worker processes to not halt startup in the presence of outstanding database migrations. ([\#11346](https://github.com/matrix-org/synapse/issues/11346))
- Fix a bug introduced in 1.47.0rc1 which prevented the 'remove deleted devices from `device_inbox` column' background process from running when updating from a recent Synapse version. ([\#11303](https://github.com/matrix-org/synapse/issues/11303), [\#11353](https://github.com/matrix-org/synapse/issues/11353))
* remove unused tables room_stats_historical and user_stats_historical
* update changelog number
* Bump schema compat version comment
* make linter happy
* Update comment to give more info
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
Resolve and share `state_groups` for all historical events in batch. This also helps for showing the appropriate avatar/displayname in Element and will work whenever `/messages` has one of the historical messages as the first message in the batch.
This does have the flaw where if you just insert a single historical event somewhere, it probably won't resolve the state correctly from `/messages` or `/context` since it will grab a non historical event above or below with resolved state which never included the historical state back then. For the same reasions, this also does not work in Element between the transition from actual messages to historical messages. In the Gitter case, this isn't really a problem since all of the historical messages are in one big lump at the beginning of the room.
For a future iteration, might be good to look at `/messages` and `/context` to additionally add the `state` for any historical messages in that batch.
---
How are the `state_groups` shared? To illustrate the `state_group` sharing, see this example:
**Before** (new `state_group` for every event 😬, very inefficient):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$_JXfwUDIWS6xKGG4SmZXjSFrizhARM7QblhATVWWUcA state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$1ZBfmBKEjg94d-vGYymKrVYeghwBOuGJ3wubU1-I9y0 state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$Mq2JvRetTyclPuozRI682SAjYp3GqRuPc8_cH5-ezPY state_group=10
create_new_client_event m.room.message event=$MfmY4rBQkxrIp8jVwVMTJ4PKnxSigpG9E2cn7S0AtTo state_group=11
create_new_client_event m.room.message event=$uYOv6V8wiF7xHwOMt-60d1AoOIbqLgrDLz6ZIQDdWUI state_group=12
create_new_client_event m.room.message event=$PAbkJRMxb0bX4A6av463faiAhxkE3FEObM1xB4D0UG4 state_group=13
create_new_client_event org.matrix.msc2716.batch event=$Oy_S7AWN7rJQe_MYwGPEy6RtbYklrI-tAhmfiLrCaKI state_group=14
```
**After** (all events in batch sharing `state_group=10`) (the base insertion event has `state_group=8` which matches the `prev_event` we're inserting next to):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$PWomJ8PwENYEYuVNoG30gqtybuQQSZ55eldBUSs0i0U state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$e_mCU7Eah9ABF6nQU7lu4E1RxIWccNF05AKaTT5m3lw state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$ui7A3_GdXIcJq0C8GpyrF8X7B3DTjMd_WGCjogax7xU state_group=10
create_new_client_event m.room.message event=$EnTIM5rEGVezQJiYl62uFBl6kJ7B-sMxWqe2D_4FX1I state_group=10
create_new_client_event m.room.message event=$LGx5jGONnBPuNhAuZqHeEoXChd9ryVkuTZatGisOPjk state_group=10
create_new_client_event m.room.message event=$wW0zwoN50lbLu1KoKbybVMxLbKUj7GV_olozIc5i3M0 state_group=10
create_new_client_event org.matrix.msc2716.batch event=$5ZB6dtzqFBCEuMRgpkU201Qhx3WtXZGTz_YgldL6JrQ state_group=10
```
Before Synapse 1.31 (#9411), we relied on `outlier` being stored in the
`internal_metadata` column. We can now assume nobody will roll back their
deployment that far and drop the legacy support.
This avoids the overhead of searching through the various
configuration classes by directly referencing the class that
the attributes are in.
It also improves type hints since mypy can now resolve the
types of the configuration variables.