Commit Graph

805 Commits

Author SHA1 Message Date
Patrick Cloke 9d8a3234ba
Respond with proper error responses on unknown paths. (#14621)
Returns a proper 404 with an errcode of M_RECOGNIZED for
unknown endpoints per MSC3743.
2022-12-08 11:37:05 -05:00
Patrick Cloke da77720752
Check the stream position before checking if the cache is empty. (#14639)
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
2022-12-08 11:35:49 -05:00
Erik Johnston cee9445884
Better return type for `get_all_entities_changed` (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke 6a8310f3df
Compare to the earliest known stream pos in the stream change cache. (#14435)
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
2022-12-05 09:00:59 -05:00
Patrick Cloke 1799a54a54
Batch fetch bundled annotations (#14491)
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.

This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
2022-11-22 07:26:11 -05:00
Patrick Cloke d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Patrick Cloke 13ca8bb2fc
Remove duplicated code to evict entries. (#14410)
This code was factored out to a method, but also left in-place.

Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
2022-11-10 15:33:34 -05:00
Eric Eastwood 40fa8294e3
Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull from `destination` pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Quentin Gliech 8756d5c87e
Save login tokens in database (#13844)
* Save login tokens in database

Signed-off-by: Quentin Gliech <quenting@element.io>

* Add upgrade notes

* Track login token reuse in a Prometheus metric

Signed-off-by: Quentin Gliech <quenting@element.io>
2022-10-26 11:45:41 +01:00
Nick Mills-Barrett c9dffd5b33
Remove unused `@lru_cache` decorator (#13595)
* Remove unused `@lru_cache` decorator

Spotted this working on something else.

Co-authored-by: David Robertson <davidr@element.io>
2022-10-25 11:39:25 +01:00
dependabot[bot] 0b7830e457
Bump flake8-bugbear from 21.3.2 to 22.9.23 (#14042)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2022-10-19 19:38:24 +00:00
Abdullah Osama a9934d48c1
Making parse_server_name more consistent (#14007)
Fixes #12122
2022-10-11 12:42:11 +00:00
David Robertson cb20b885cb
Always close _all_ `ijson` coroutines, even if doing so raises Exceptions (#14065) 2022-10-06 18:17:50 +00:00
David Robertson 6f0c3e669d
Don't require `setuptools_rust` at runtime (#13952) 2022-09-29 20:16:08 +00:00
Eric Eastwood 29269d9d3f
Fix `have_seen_event` cache not being invalidated (#13863)
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- https://github.com/matrix-org/synapse/issues/13856*


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
2022-09-27 15:55:43 -05:00
Mathieu Velten 6bd8763804
Add cache invalidation across workers to module API (#13667)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-09-21 15:32:01 +02:00
reivilibre cf65433de2
Fix a memory leak when running the unit tests. (#13798) 2022-09-14 15:29:05 +00:00
Erik Johnston ebfeac7c5d
Check if Rust lib needs rebuilding. (#13759)
This protects against the common mistake of failing to remember to rebuild Rust code after making changes.
2022-09-12 10:03:42 +00:00
reivilibre cf11919ddd
Fix cache metrics not being updated when not using the legacy exposition module. (#13717) 2022-09-08 15:30:48 +01:00
reivilibre b455c2a5ec
Update Grafana dashboard to not use legacy metric names. (#13714) 2022-09-06 12:21:21 +01:00
reivilibre 7bc110a19e
Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
David Robertson 4249082eed
Merge branch 'release-v1.66' into develop 2022-08-30 15:31:51 +01:00
Eric Eastwood 1eea73b413
Fix rate limit metrics registering twice and misreporting (#13649)
* Fix rate limit metrics registering twice and misreporting

Fix https://github.com/matrix-org/synapse/issues/13641

* Fix lints

* Add changelog

* Document `metrics_name=None`.
2022-08-30 12:08:29 +01:00
reivilibre be4250c7a8
Add experimental configuration option to allow disabling legacy Prometheus metric names. (#13540)
Co-authored-by: David Robertson <davidr@element.io>
2022-08-24 11:35:54 +00:00
Erik Johnston f7ddfe17a3
Speed up `@cachedList` (#13591)
This speeds things up by ~2x.

The vast majority of the time is now spent in `LruCache` moving things around the linked lists.

We do this via two things:
1. Don't create a deferred per-key during bulk set operations in `DeferredCache`. Instead, only create them if a subsequent caller asks for the key.
2. Add a bulk lookup API to `DeferredCache` rather than use a loop.
2022-08-23 14:53:27 +00:00
Nick Mills-Barrett 5e7847dc92
Cache user IDs instead of profile objects (#13573)
The profile objects are never used and increase cache size significantly.
2022-08-23 09:49:59 +00:00
Sean Quah b251cff819
Fix incorrect juggling of logging contexts in `_PerHostRatelimiter` (#13554)
Signed-off-by: Sean Quah <seanq@matrix.org>

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2022-08-18 16:26:26 +01:00
Eric Eastwood d64653d062
Track number of hosts affected by the rate limiter (#13541)
Track number of hosts affected by the rate limiter so we can differentiate one really noisy homeserver from a general ratelimit tuning problem across the federation.

Follow-up to https://github.com/matrix-org/synapse/pull/13534

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-18 10:05:07 -05:00
Eric Eastwood 49d04e43df
Add metrics to track how the rate limiter is affecting requests (sleep/reject) (#13534)
Related to https://github.com/matrix-org/synapse/pull/13499

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-17 16:10:07 -05:00
Eric Eastwood c6ee9c0ee4
Add metrics to track rate limiter queue timing (#13544) 2022-08-17 10:38:05 +01:00
Eric Eastwood 344a2f767c
Instrument `FederationStateIdsServlet` - `/state_ids` (#13499)
Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.
2022-08-15 19:41:23 +01:00
Nick Mills-Barrett 41320a0554
Optimise async get event lookups (#13435)
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
2022-08-04 15:49:55 +01:00
Dirk Klimpel d6e94ad9d9
Rename `RateLimitConfig` to `RatelimitSettings` (#13442) 2022-08-03 10:40:20 +01:00
Erik Johnston 0b87eb8e0c
Make DictionaryCache have better expiry properties (#13292) 2022-07-21 17:13:44 +01:00
Nick Mills-Barrett cc21a431f3
Async get event cache prep (#13242)
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-15 09:30:46 +00:00
David Robertson 6ba732fefe
Type `tests.utils` (#13028)
* Cast to postgres types when handling postgres db

* Remove unused method

* Easy annotations

* Annotate create_room

* Use `ParamSpec` to annotate looping_call

* Annotate `default_config`

* Track `now` as a float

`time_ms` returns an int like the proper Synapse `Clock`

* Introduce a `Timer` dataclass

* Introduce a Looper type

* Suppress checking of a mock

* tests.utils is typed

* Changelog

* Whoops, import ParamSpec from typing_extensions

* ditch the psycopg2 casts
2022-07-05 15:13:47 +01:00
Quentin Gliech fe1daad672
Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986)
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.

This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.

Macaroon generation is refactored:

- Centralised all macaroon generation and verification logic to the
  `MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
  (instead of a full `Homeserver`).
- Added tests for all methods.
2022-06-14 09:12:08 -04:00
David Robertson f30bcbd84a
Fix Synapse git info missing in version strings (#12973) 2022-06-07 15:24:11 +01:00
Patrick Cloke 759f9c09e1
Fix caching behavior for relations push rules. (#12859)
By always returning all requested values from the function
wrapped by cachedList. Otherwise implicit None values get
added into the cache, which are unexpected.
2022-05-25 07:49:54 -04:00
Sean Quah 2be5a2b07b
Fix `RetryDestinationLimiter` re-starting finished log contexts (#12803)
Signed-off-by: Sean Quah <seanq@matrix.org>
2022-05-19 20:17:10 +01:00
David Robertson d4713d3e33
Discard null-containing strings before updating the user directory (#12762) 2022-05-18 11:28:14 +01:00
Shay cde8af9a49
Add config flags to allow for cache auto-tuning (#12701) 2022-05-13 12:32:39 -07:00
Erik Johnston 8dd3e0e084
Immediately retry any requests that have backed off when a server comes back online. (#12500)
Otherwise it can take up to a minute for any in-flight `/send` requests to be retried.
2022-05-10 10:39:54 +01:00
David Robertson fa0eab9c8e
Use `ParamSpec` in a few places (#12667) 2022-05-09 10:27:39 +00:00
Erik Johnston 4337d33a73
Prevent memory leak from reoccurring when presence is disabled. (#12656) 2022-05-06 16:41:57 +00:00
David Robertson 6463244375
Remove unused `# type: ignore`s (#12531)
Over time we've begun to use newer versions of mypy, typeshed, stub
packages---and of course we've improved our own annotations. This makes
some type ignore comments no longer necessary. I have removed them.

There was one exception: a module that imports `select.epoll`. The
ignore is redundant on Linux, but I've kept it ignored for those of us
who work on the source tree using not-Linux. (#11771)

I'm more interested in the config line which enforces this. I want
unused ignores to be reported, because I think it's useful feedback when
annotating to know when you've fixed a problem you had to previously
ignore.

* Installing extras before typechecking

Lacking an easy way to install all extras generically, let's bite the bullet and
make install the hand-maintained `all` extra before typechecking.

Now that https://github.com/matrix-org/backend-meta/pull/6 is merged to
the release/v1 branch.
2022-04-27 14:03:44 +01:00
Patrick Cloke 8a23bde823
Consistently use collections.abc.Mapping to check frozendict. (#12564) 2022-04-27 09:00:07 -04:00
Sean Quah a50fb411b3
Update `delay_cancellation` to accept any awaitable (#12468)
This will mainly be useful when dealing with module callbacks, which are
all typed as returning `Awaitable`s instead of coroutines or
`Deferred`s.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-22 18:20:06 +01:00
Sean Quah 79e7c2c426
Fix edge case where a `Linearizer` could get stuck (#12358)
Just after a task acquires a contended `Linearizer` lock, it sleeps.
If the task is cancelled during this sleep, we need to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 17:19:16 +01:00
Sean Quah 800ba87cc8
Refactor and convert `Linearizer` to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00