We were repeatedly looking up a config option in a loop (using the
unclassed config style), which is expensive enough that it can cause
large CPU usage.
An accidental mis-ordering of operations during #6739 technically allowed an incoming knock event over federation in before checking it against any configured Third Party Access Rules modules.
This PR corrects that by performing the TPAR check *before* persisting the event.
This PR will run a new "Deploy release-specific documentation" job whenever a push to a branch name matching `release-v*` occurs. Doing so will create/add to a folder named `vX.Y` on the `gh-pages` branch. Doing so will allow us to build up `major.minor` releases of the docs as we release Synapse.
This is especially useful for having a mechanism for keeping around documentation of old/removed features (for those running older versions of Synapse), without needing to clutter the latest copy of the docs.
After a [discussion](https://matrix.to/#/!XaqDhxuTIlvldquJaV:matrix.org/$rKmkBmQle8OwTlGcoyu0BkcWXdnHW3_oap8BMgclwIY?via=matrix.org&via=vector.modular.im&via=envs.net) in #synapse-dev, we wanted to use tags to trigger the documentation deployments, which I agreed with. However, I soon realised that the bash-foo required to turn a tag of `v1.2.3rc1` into `1.2` was a lot more complex than the branch's `release-v1.2`. So, I've gone with the latter for simplicity.
In the future we'll have some UI on the website to switch between versions, but for now you can simply just change 'develop' to 'v1.2' in the URL.
This could cause a minor data leak if someone defined a non-restricted join rule
with an allow key or used a restricted join rule in an older room version, but this is
unlikely.
Additionally this starts adding unit tests to the spaces summary handler.
This PR adds a common configuration section for all modules (see docs). These modules are then loaded at startup by the homeserver. Modules register their hooks and web resources using the new `register_[...]_callbacks` and `register_web_resource` methods of the module API.
Fixes https://github.com/matrix-org/synapse/issues/10030.
We were expecting milliseconds where we should have provided a value in seconds.
The impact of this bug isn't too bad. The code is intended to count the number of remote servers that the homeserver can see and report that as a metric. This metric is supposed to run initially 1 second after server startup, and every 60s as well. Instead, it ran 1,000 seconds after server startup, and every 60s after startup.
This fix allows for the correct metrics to be collected immediately, as well as preventing a random collection 1,000s in the future after startup.
Dangerous actions means deactivating an account, modifying an account
password, or adding a 3PID.
Other actions (deleting devices, uploading keys) can re-use the same UI
auth session if ui_auth.session_timeout is configured.
This doc is short but a useful guide to what the request log lines mean.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: Daniele Sluijters <daenney@users.noreply.github.com>
* Trace event persistence
When we persist a batch of events, set the parent opentracing span to the that
from the request, so that we can trace all the way in.
* changelog
* When we force tracing, set a baggage item
... so that we can check again later.
* Link in both directions between persist_events spans
* Room version 7 for knocking.
* Stable prefixes and endpoints (both client and federation) for knocking.
* Removes the experimental configuration flag.
Add 'federation_ip_range_whitelist'. This allows backwards-compatibility, If 'federation_ip_range_blacklist' is set. Otherwise 'ip_range_whitelist' will be used for federation servers.
Signed-off-by: Michael Kutzner 1mikure@gmail.com
This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug.
The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
==============================
Bugfixes
--------
- Fix a bug which caused presence updates to stop working some time after a restart, when using a presence writer worker. Broke in v1.33.0. ([\#10149](https://github.com/matrix-org/synapse/issues/10149))
- Fix a bug when using federation sender worker where it would send out more presence updates than necessary, leading to high resource usage. Broke in v1.33.0. ([\#10163](https://github.com/matrix-org/synapse/issues/10163))
- Fix a bug where Synapse could send the same presence update to a remote twice. ([\#10165](https://github.com/matrix-org/synapse/issues/10165))
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmDDdrcQHGVyaWtAbWF0
cml4Lm9yZwAKCRClQuTtGw+sCTzfB/4qaTqW2mBiwjf52SmOu6HNyd8uQd6nLIAZ
mJC218Wakh2tT0W4iVkKwUgpuHFbtcy0rSNTlXtW4kG8XzhpTvT56RH9qls99aD3
SGKqYpOv6NkWZibN6NdVvLDW85ixficDTXco3BljRCIMlORhY0swy+LWLwksdjWj
6kQ+Gi/QAtKP3Pt5epYs0Ix5o1T94DfZOWE//mqBhG5cMDAw/K/G/c8tRfEjclt9
wACBjmt2fw/Lbn9j3b0feNVp+xnFcFNuAK2bSEd8Y3yph1mhjdsIszULnM7IFNsR
Q8zg+i7PJKNq8pQjei8j8T/aKscPTPH5XGqOSLlizj15snsiwlkz
=C6lV
-----END PGP SIGNATURE-----
Merge tag 'v1.36.0rc2' into develop
Synapse 1.36.0rc2 (2021-06-11)
==============================
Bugfixes
--------
- Fix a bug which caused presence updates to stop working some time after a restart, when using a presence writer worker. Broke in v1.33.0. ([\#10149](https://github.com/matrix-org/synapse/issues/10149))
- Fix a bug when using federation sender worker where it would send out more presence updates than necessary, leading to high resource usage. Broke in v1.33.0. ([\#10163](https://github.com/matrix-org/synapse/issues/10163))
- Fix a bug where Synapse could send the same presence update to a remote twice. ([\#10165](https://github.com/matrix-org/synapse/issues/10165))
This is essentially an implementation of the proposal made at https://hackmd.io/@richvdh/BJYXQMQHO, though the details have ended up looking slightly different.
This implements similar behavior to sytest where a matching branch is used,
if one exists. This is useful when needing to modify both application code
and tests at the same time. The following rules are used to find a matching
complement branch:
1. Search for the branch name of the pull request. (E.g. feature/foo.)
2. Search for the base branch of the pull request. (E.g. develop or release-vX.Y.)
3. Search for the reference branch of the commit. (E.g. master or release-vX.Y.)
4. Fallback to 'master', the default complement branch name.
Spawned from missing messages we were seeing on `matrix.org` from a
federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770.
The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066
where the message and join event race and the message is `soft_failed` before the
`join` event reaches the remote federated server.
Less soft_failed events = better and usually this should only trigger for events
where people are doing bad things and trying to fuzz and fake everything.
This PR updates the build tags that we perform Complement runs with to match our [buildkite pipeline](618b3e90bc/synapse/pipeline.yml (L570)), as well as adding `msc2403` (as it will be required once #9359 is merged). Build tags are what we use to determine which tests to run in Complement (really it determines which test files are compiled into the final binary).
I haven't put in a comment about updating the buildkite side here, as we've decided to migrate fully to GitHub Actions anyhow.
With the prior format, 1.33.0 / 1.33.1 / 1.33.2 got separate branches:
release-v1.33.0
release-v1.33.1
release-v1.33.2
Under the new model, all three would share a common branch:
release-v1.33
As before, RCs and actual releases exist as tags on these branches.
This better reflects our support model, e.g., that the "1.33" series had
a formal release followed by two patches / updates.
Signed-off-by: Dan Callahan <danc@element.io>
Fixes#1834.
`get_new_events_for_appservice` internally calls `get_events_as_list`, which will filter out any rejected events. If all returned events are filtered out, `_notify_interested_services` will return without updating the last handled stream position. If there are 100 consecutive such events, processing will halt altogether.
Breaking the loop is now done by checking whether we're up-to-date with `current_max` in the loop condition, instead of relying on an empty `events` list.
Signed-off-by: Willem Mulder <14mRh4X0r@gmail.com>
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
The existing tracing reports an error each time there is a timeout, which isn't
really representative.
Additionally, we log things about the way `wait_for_events` works
(eg, the result of the callback) to the *parent* span, which is confusing.
So that they render nicely in mdbook (see #10086), and so that we no longer have a mix of structured text languages in our documentation (excluding files outside of `docs/`).
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
* Make `invalidate` and `invalidate_many` do the same thing
... so that we can do either over the invalidation replication stream, and also
because they always confused me a bit.
* Kill off `invalidate_many`
* changelog
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.
The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).
Commits should be independently reviewable.
* Fix /upload 500'ing when presented a very large image
Catch DecompressionBombError and re-raise as ThumbnailErrors
* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml
to get it to bomb out quicker, to load less into memory
in the case of super large images
* Add changelog entry for 10029
https://github.com/matrix-org/synapse/issues/9962 uncovered that we accidentally removed all but one of the presence updates that we store in the database when persisting multiple updates. This could cause users' presence state to be stale.
The bug was fixed in #10014, and this PR just adds a test that failed on the old code, and was used to initially verify the bug.
The test attempts to insert some presence into the database in a batch using `PresenceStore.update_presence`, and then simply pulls it out again.
Fixes: https://github.com/matrix-org/synapse/issues/9962
This is a fix for above problem.
I fixed it by swaping the order of insertion of new records and deletion of old ones. This ensures that we don't delete fresh database records as we do deletes before inserts.
Signed-off-by: Marek Matys <themarcq@gmail.com>
Also add support for giving a callback to generate the JSON object to
verify. This should reduce memory usage, as we no longer have the event
in memory in dict form (which has a large memory footprint) for extend
periods of time.
Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
To be more consistent with similar code. The check now automatically
raises an AuthError instead of passing back a boolean. It also absorbs
some shared logic between callers.
- use a tuple rather than a list for the iterable that is passed into the
wrapped function, for performance
- test that we can pass an iterable and that keys are correctly deduped.
It's not obvious that instances of SQLBaseStore each need their own
instances of random.SystemRandom(); let's just use random directly.
Introduced by 52839886d6
Signed-off-by: Dan Callahan <danc@element.io>
We can get away with just catching UnicodeError here.
⋮
+-- ValueError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
⋮
https://docs.python.org/3/library/exceptions.html#exception-hierarchy
Signed-off-by: Dan Callahan <danc@element.io>
Functionally identical, but more obviously cryptographically secure.
...Explicit is better than implicit?
Avoids needing to know that SystemRandom() implies a CSPRNG, and
complies with the big scary red box on the documentation for random:
> Warning:
> The pseudo-random generators of this module should not be used for
> security purposes. For security or cryptographic uses, see the
> secrets module.
https://docs.python.org/3/library/random.html
Signed-off-by: Dan Callahan <danc@element.io>
This should help ensure that equivalent results are achieved between
homeservers querying for the summary of a space.
This implements modified MSC1772 rules, according to MSC2946.
The different is that the origin_server_ts of the m.room.create event
is not used as a tie-breaker since this might not be known if the
homeserver is not part of the room.
Per changes in MSC2946, the C-S and S-S APIs for spaces summary
should use GET requests.
Until this is stable, the POST endpoints still exist.
This does not switch federation requests to use the GET version yet
since it is newly added and already deployed servers might not support
it. When switching to the stable endpoint we should switch to GET
requests.
MSC1772 specifies the m.room.create event should be sent as part
of the invite_state. This was done optionally behind an experimental
flag, but is now done by default due to MSC1772 being approved.
Now that cross signing exists there is much less of a need for other people to look at devices and verify them individually. This PR adds a config option to allow you to prevent device display names from being shared with other servers.
Signed-off-by: Aaron Raimist <aaron@raim.ist>
We were pulling the full auth chain for the room out of the DB each time
we backfilled, which can be *huge* for large rooms and is totally
unnecessary.
The hope here is that by moving all the schema files into synapse/storage/schema, it gets a bit easier for newcomers to navigate.
It certainly got easier for me to write a helpful README. There's more to do on that front, but I'll follow up with other PRs for that.
This is an update based on changes to MSC2946. The origin_server_ts
of the m.room.create event is copied into the creation_ts field for each
room returned from the spaces summary.
Synapse can be quite memory intensive, and unless care is taken to tune
the GC thresholds it can end up thrashing, causing noticable performance
problems for large servers. We fix this by limiting how often we GC a
given generation, regardless of current counts/thresholds.
This does not help with the reverse problem where the thresholds are set
too high, but that should only happen in situations where they've been
manually configured.
Adds a `gc_min_seconds_between` config option to override the defaults.
Fixes#9890.
* Add healthcheck startup delay by 5secs and reduced interval check to 15s
to reduce waiting time for docker aware edge routers bringing an
instance online
This leaves out all optional keys from /sync. This should be fine for all clients tested against conduit already, but it may break some clients, as such we should check, that at least most of them don't break horribly and maybe back out some of the individual changes. (We can probably always leave out groups for example, while the others may cause more issues.)
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Support the delete of a room through DELETE request and mark
previous request as deprecated through documentation.
Signed-off-by: Thibault Ferrante <thibault.ferrante@pm.me>
This fixes a regression where the logging context for runWithConnection
was reported as runWithConnection instead of the connection name,
e.g. "POST-XYZ".
I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.
This ensures that something like an auth error (403) will be
returned to the requester instead of attempting to try more
servers, which will likely result in the same error, and then
passing back a generic 400 error.
First of all, a fixup to `FakeChannel` which is needed to make it work with the default HTTP channel implementation.
Secondly, it looks like we no longer need `_PushHTTPChannel`, because as of #8013, the producer that gets attached to the `HTTPChannel` is now an `IPushProducer`. This is good, because it means we can remove a whole load of test-specific boilerplate which causes variation between tests and production.
Applied a (slightly modified) patch from https://github.com/matrix-org/synapse/issues/9574.
As far as I understand this would allow the cookie set during the OIDC flow to work on deployments using public baseurls that do not sit at the URL path root.
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join rules.
This only applies to an experimental room version, as defined in MSC3083.
This attempts to be a direct port of https://github.com/matrix-org/synapse-dinsic/pull/74 to mainline. There was some fiddling required to deal with the changes that have been made to mainline since (mainly dealing with the split of `RegistrationWorkerStore` from `RegistrationStore`, and the changes made to `self.make_request` in test code).
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join
rules.
This only applies to an experimental room version, as defined in MSC3083.
This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).
Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
This PR adds a Dockerfile and some supporting files to the `docker/` directory. The Dockerfile's intention is to spin up a container with:
* A Synapse main process.
* Any desired worker processes, defined by a `SYNAPSE_WORKERS` environment variable supplied at runtime.
* A redis for worker communication.
* A nginx for routing traffic.
* A supervisord to start all worker processes and monitor them if any go down.
Note that **this is not currently intended to be used in production**. If you'd like to use Synapse workers with Docker, instead make use of the official image, with one worker per container. The purpose of this dockerfile is currently to allow testing Synapse in worker mode with the [Complement](https://github.com/matrix-org/complement/) test suite.
`configure_workers_and_start.py` is where most of the magic happens in this PR. It reads from environment variables (documented in the file) and creates all necessary config files for the processes. It is the entrypoint of the Dockerfile, and thus is run any time the docker container is spun up, recreating all config files in case you want to use a different set of workers. One can specify which workers they'd like to use by setting the `SYNAPSE_WORKERS` environment variable (as a comma-separated list of arbitrary worker names) or by setting it to `*` for all worker processes. We will be using the latter in CI.
Huge thanks to @MatMaul for helping get this all working 🎉 This PR is paired with its equivalent on the Complement side: https://github.com/matrix-org/complement/pull/62.
Note, for the purpose of testing this PR before it's merged: You'll need to (re)build the base Synapse docker image for everything to work (`matrixdotorg/synapse:latest`). Then build the worker-based docker image on top (`matrixdotorg/synapse:workers`).
Context is in https://github.com/matrix-org/synapse/issues/9764#issuecomment-818615894.
I struggled to find a more official link for this. The problem occurs when using WSL1 instead of WSL2, which some Windows platforms (at least Server 2019) still don't have. Docker have updated their documentation to paint a much happier picture now given WSL2's support.
The last sentence here can probably be removed once WSL1 is no longer around... though that will likely not be for a very long time.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
This change ensures that the appservice registration behaviour follows the spec. We decided to do this for Dendrite, so it made sense to also make a PR for synapse to correct the behaviour.
Related: #8334
Deprecated in: #9429 - Synapse 1.28.0 (2021-02-25)
`GET /_synapse/admin/v1/users/<user_id>` has no
- unit tests
- documentation
API in v2 is available (#5925 - 12/2019, v1.7.0).
API is misleading. It expects `user_id` and returns a list of all users.
Signed-off-by: Dirk Klimpel dirk@klimpel.org
We pull all destinations requiring catchup from the DB in batches.
However, if all those destinations get filtered out (due to the
federation sender being sharded), then the `last_processed` destination
doesn't get updated, and we keep requesting the same set repeatedly.