Erik Johnston
51f7eaf908
Add ability to run replication protocol over redis. ( #7040 )
...
This is configured via the `redis` config options.
2020-04-22 13:07:41 +01:00
Richard van der Hoff
0f8f02bc39
On catchup, process each row with its own stream id ( #7286 )
...
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024 , and I hope this fixes #7206 .
2020-04-20 11:43:29 +01:00
Richard van der Hoff
67ff7b8ba0
Improve type checking in `replication.tcp.Stream` ( #7291 )
...
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290 .
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
2020-04-17 14:49:55 +01:00
Richard van der Hoff
d7d42387f5
Fix 'generator object is not subscriptable' error ( #7290 )
...
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024 )
2020-04-16 14:37:06 +01:00
Richard van der Hoff
e13c6c7a96
Handle one-word replication commands correctly
...
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
2020-04-07 17:43:46 +01:00
Richard van der Hoff
c3e4b4edb2
Fix warnings about not calling superclass constructor
...
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
2020-04-07 17:40:22 +01:00
Richard van der Hoff
6a519a0ca0
Remove vestigal references to SYNC replication command
...
We've ripped pretty much all of this out: let's remove the remains.
2020-04-07 17:40:07 +01:00
Erik Johnston
ce72355d7f
Fix race in replication ( #7226 )
...
Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
2020-04-07 11:01:04 +01:00
Erik Johnston
82498ee901
Move server command handling out of TCP protocol ( #7187 )
...
This completes the merging of server and client command processing.
2020-04-07 10:51:07 +01:00
Erik Johnston
5016b162fc
Move client command handling out of TCP protocol ( #7185 )
...
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
2020-04-06 09:58:42 +01:00
Erik Johnston
dfa0782254
Remove connections per replication stream metric. ( #7195 )
...
This broke in a recent PR (#7024 ) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
2020-04-01 10:40:46 +01:00
Erik Johnston
4f21c33be3
Remove usage of "conn_id" for presence. ( #7128 )
...
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston
4cff617df1
Move catchup of replication streams to worker. ( #7024 )
...
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Richard van der Hoff
a564b92d37
Convert `*StreamRow` classes to inner classes ( #7116 )
...
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
2020-03-23 13:59:11 +00:00
Richard van der Hoff
b3cee0ce67
Fix processing of `groups` stream, and use symbolic names for streams ( #7117 )
...
`groups` != `receipts`
Introduced in #6964
2020-03-23 11:39:36 +00:00
Erik Johnston
fdb1344716
Remove concept of a non-limited stream. ( #7011 )
2020-03-20 14:40:47 +00:00
Erik Johnston
9ce4e344a8
Change device list replication to match new semantics.
...
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
2020-02-28 11:25:34 +00:00
Erik Johnston
1f773eec91
Port PresenceHandler to async/await ( #6991 )
2020-02-26 15:33:26 +00:00
Erik Johnston
0bd8cf435e
Increase MAX_EVENTS_BEHIND for replication clients
2020-02-21 09:04:33 +00:00
Erik Johnston
c3d4ad8afd
Fix sending server up commands from workers ( #6811 )
...
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-01-30 16:42:11 +00:00
Erik Johnston
d5275fc55f
Propagate cache invalidates from workers to other workers. ( #6748 )
...
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
2020-01-27 13:47:50 +00:00
Erik Johnston
5d7a6ad223
Allow streaming cache invalidate all to workers. ( #6749 )
2020-01-22 10:37:00 +00:00
Erik Johnston
a8a50f5b57
Wake up transaction queue when remote server comes back online ( #6706 )
...
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
2020-01-17 10:27:19 +00:00
Erik Johnston
48c3a96886
Port synapse.replication.tcp to async/await ( #6666 )
...
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
2020-01-16 09:16:12 +00:00
Erik Johnston
e8b68a4e4b
Fixup synapse.replication to pass mypy checks ( #6667 )
2020-01-14 14:08:06 +00:00
Richard van der Hoff
6964ea095b
Reduce the reconnect time when replication fails. ( #6617 )
2020-01-03 14:19:09 +00:00
Andrew Morgan
cd96b4586f
lint
2019-11-08 15:45:45 +00:00
Andrew Morgan
c4bdf2d785
Remove content from being sent for account data rdata stream
2019-11-08 15:44:02 +00:00
Richard van der Hoff
cc6243b4c0
document the REPLICATE command a bit better ( #6305 )
...
since I found myself wonder how it works
2019-11-04 12:40:18 +00:00
Hubert Chathi
9c94b48bf1
Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify
2019-10-31 12:32:07 -04:00
Andrew Morgan
54fef094b3
Remove usage of deprecated logger.warn method from codebase ( #6271 )
...
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Hubert Chathi
998f7fe7d4
make user signatures a separate stream
2019-10-30 17:22:52 -04:00
Andrew Morgan
4548d1f87e
Remove unnecessary parentheses around return statements ( #5931 )
...
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
2019-08-30 16:28:26 +01:00
Amber Brown
4806651744
Replace returnValue with return ( #5736 )
2019-07-23 23:00:55 +10:00
Amber Brown
463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ ( #5606 )
2019-07-04 00:07:04 +10:00
Amber Brown
32e7c9e7f2
Run Black. ( #5482 )
2019-06-20 19:32:02 +10:00
Erik Johnston
b5c62c6b26
Fix relations in worker mode
2019-05-16 10:38:13 +01:00
Richard van der Hoff
4b91c313a9
Combine the CurrentStateDeltaStream into the EventStream
2019-03-27 22:07:05 +00:00
Richard van der Hoff
1f6d6f918a
Make EventStream rows have a type
...
... as a precursor to combining it with the CurrentStateDelta stream.
2019-03-27 22:07:05 +00:00
Richard van der Hoff
015b3622eb
Skip building a ROW_TYPE when building updates
...
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
2019-03-27 21:58:03 +00:00
Richard van der Hoff
f570916a3e
Add parse_row method to replication stream class
...
This will allow individual stream classes to override how a row is parsed.
2019-03-27 21:32:33 +00:00
Richard van der Hoff
71dcb275f1
move FederationStream out to its own file
2019-03-27 21:13:14 +00:00
Richard van der Hoff
aa1e017864
move EventsStream out to its own file
2019-03-27 21:13:14 +00:00
Richard van der Hoff
a5798de067
Move replication.tcp.streams into a package
2019-03-27 21:13:14 +00:00
Richard van der Hoff
acaa18f7dd
Fix/improve some docstrings in the replication code. ( #4949 )
2019-03-27 21:12:36 +00:00
Richard van der Hoff
8cbbedaa2b
Fix ClientReplicationStreamProtocol.__str__ ( #4929 )
...
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.
We can calculate the peer addr from the transport, so there is no need for addr
anyway.
2019-03-25 16:41:51 +00:00
Richard van der Hoff
9bde730ef8
Fix bug where read-receipts lost their timestamps ( #4927 )
...
Make sure that they are sent correctly over the replication stream.
Fixes : #4898
2019-03-25 16:38:05 +00:00
Richard van der Hoff
cdb8036161
Add a config option for torture-testing worker replication. ( #4902 )
...
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
2019-03-20 16:04:35 +00:00
Andrew Morgan
b9f6163092
Simplify token replication logic
2019-03-05 13:58:30 +00:00
Andrew Morgan
fe7bd23a85
Clean up logic and add comments
2019-03-04 15:08:15 +00:00
Andrew Morgan
9f7cdf3da1
Clearer branching, fix missing list clear
2019-03-04 14:36:52 +00:00
Andrew Morgan
5f0c449dd5
Prevent replication wedging
2019-03-04 14:03:18 +00:00
Erik Johnston
7590e9fa28
Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff
...
Fix tightloop over connecting to replication server
2019-02-27 11:00:59 +00:00
Erik Johnston
6bb1c028f1
Limit cache invalidation replication line length ( #4748 )
2019-02-27 10:28:37 +00:00
Erik Johnston
6870fc496f
Move connecting logic into ClientReplicationStreamProtocol
2019-02-27 10:23:51 +00:00
Erik Johnston
25814921f1
Increase the max delay between retry attempts
...
Otherwise if you have many workers they can easily take out master with
their connection attempts
2019-02-26 15:12:33 +00:00
Erik Johnston
313987187e
Fix tightloop over connecting to replication server
...
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.
This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.
The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Erik Johnston
a163b748a5
Don't truncate command name in metrics
2018-10-29 17:34:21 +00:00
Amber Brown
c4b3698a80
Make the replication logger quieter ( #4108 )
2018-10-29 22:59:44 +11:00
Travis Ralston
f1a7264663
Fix minor typo in exception
2018-09-13 11:51:12 -06:00
Erik Johnston
3e242dc149
Remove conn_id
2018-09-04 11:45:52 +01:00
Erik Johnston
b13836da7f
Remove conn_id from repl prometheus metrics
...
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
2018-09-03 17:22:49 +01:00
Richard van der Hoff
0e8d78f6aa
Logcontexts for replication command handlers
...
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.
Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
2018-08-17 00:43:43 +01:00
Richard van der Hoff
f59be4eb0e
Fix unit tests
...
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
2018-07-25 10:30:36 +01:00
Richard van der Hoff
371da42ae4
Wrap a number of things that run in the background
...
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Amber Brown
49af402019
run isort
2018-07-09 16:09:20 +10:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy ( #3462 )
2018-06-28 14:49:57 +01:00
Amber Brown
07cad26d65
Remove all global reactor imports & pass it around explicitly ( #3424 )
2018-06-25 14:08:28 +01:00
Amber Brown
99b77aa829
Fix tcp protocol metrics naming ( #3410 )
2018-06-21 09:39:27 +01:00
Richard van der Hoff
b7e7fd2d0e
Fix replication metrics
...
fix bug introduced in #3256
2018-06-04 16:23:05 +01:00
Amber Brown
754826a830
Merge remote-tracking branch 'origin/develop' into 3218-official-prom
2018-05-28 18:57:23 +10:00
Amber Brown
1f69693347
Merge pull request #3244 from NotAFile/py3-six-4
...
replace some iteritems with six
2018-05-24 13:04:07 -05:00
Amber Brown
b6063631c3
more cleanup
2018-05-22 17:36:20 -05:00
Amber Brown
228f1f584e
fix the test failures
2018-05-22 15:02:38 -05:00
Amber Brown
8f5a688d42
cleanups, self-registration
2018-05-22 10:56:03 -05:00
Amber Brown
a8990fa2ec
Merge remote-tracking branch 'origin/develop' into 3218-official-prom
2018-05-22 10:50:26 -05:00
Richard van der Hoff
9ea219c514
Send users a server notice about consent
...
When a user first syncs, we will send them a server notice asking them to
consent to the privacy policy if they have not already done so.
2018-05-22 11:54:51 +01:00
Amber Brown
fcc525b0b7
rest of the changes
2018-05-21 19:48:57 -05:00
Amber Brown
df9f72d9e5
replacing portions
2018-05-21 19:47:37 -05:00
Adrian Tschira
933bf2dd35
replace some iteritems with six
...
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:59:26 +02:00
Adrian Tschira
57b58e2174
make imports local
...
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-28 13:41:41 +02:00
Richard van der Hoff
3ee4ad09eb
Fix json encoding bug in replication
...
json encoders have an encode method, not a dumps method.
2018-04-03 15:09:48 +01:00
Richard van der Hoff
05630758f2
Use static JSONEncoders
...
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
2018-03-29 23:13:33 +01:00
Erik Johnston
9aa5a0af51
Explicitly use simplejson
2018-03-20 09:58:13 +00:00
Erik Johnston
610accbb7f
Fix replication after switch to simplejson
...
Turns out that simplejson serialises namedtuple's as dictionaries rather
than tuples by default.
2018-03-19 16:12:48 +00:00
Erik Johnston
fa72803490
Merge branch 'master' of github.com:matrix-org/synapse into develop
2018-03-19 11:41:01 +00:00
Erik Johnston
926ba76e23
Replace ujson with simplejson
2018-03-15 23:43:31 +00:00
Richard van der Hoff
5c3c32f16f
Metrics for number of RDATA commands received
...
I found myself wishing we had this.
2018-01-15 17:45:55 +00:00
Richard van der Hoff
0edf085b68
Fix some logcontext leaks in replication resource
...
The @measure_func annotations rely on the wrapped function respecting the
logcontext rules. Add the necessary yields to make this work.
2017-11-23 23:19:43 +00:00
Richard van der Hoff
eaaabc6c4f
replace 'except:' with 'except Exception:'
...
what could possibly go wrong
2017-10-23 15:52:32 +01:00
hera
f807f7f804
log when we get an exception handling replication updates
2017-10-12 11:51:24 +01:00
Erik Johnston
2cc998fed8
Fix replication. And notify
2017-07-20 17:13:18 +01:00
Erik Johnston
925b3638ff
Reduce log levels in tcp replication
2017-07-11 10:04:21 +01:00
Erik Johnston
27f26e48b7
Serialize user ip command as json
2017-06-27 16:25:38 +01:00
Erik Johnston
78cefd78d6
Make workers report to master for user ip updates
2017-06-27 14:58:10 +01:00
Erik Johnston
6aa5bc8635
Initial worker impl
2017-06-16 11:47:11 +01:00
Erik Johnston
2cac7623a5
Add missing notifier
2017-06-09 11:24:41 +01:00
Erik Johnston
2e6f5a4910
Typo
2017-04-10 16:17:40 +01:00
Erik Johnston
efcb6db688
Merge pull request #2109 from matrix-org/erikj/send_queue_fix
...
Fix up federation SendQueue and document types
2017-04-10 13:09:25 +01:00
Erik Johnston
0364d23210
Up replication ping timeout
2017-04-10 11:32:05 +01:00
Erik Johnston
ab904caf33
Comments
2017-04-10 10:02:17 +01:00
Erik Johnston
98ce212093
Merge pull request #2103 from matrix-org/erikj/no-double-encode
...
Don't double encode replication data
2017-04-07 09:39:52 +01:00
Erik Johnston
ad544c803a
Document types of the replication streams
2017-04-06 13:28:52 +01:00
Erik Johnston
69b3fd485d
Fix incorrect type when using InvalidateCacheCommand
2017-04-06 09:36:38 +01:00
Erik Johnston
fcc803b2bf
Add log lines
2017-04-05 17:13:44 +01:00
Erik Johnston
3f213d908d
Rearrange metrics
2017-04-05 14:15:09 +01:00
Erik Johnston
1ca0e78ca1
Fix typo
2017-04-05 13:43:39 +01:00
Erik Johnston
b43d3267e2
Fixup some metrics for tcp repl
2017-04-05 13:34:54 +01:00
Erik Johnston
a5c401bd12
Merge pull request #2097 from matrix-org/erikj/repl_tcp_client
...
Move to using TCP replication
2017-04-05 09:36:21 +01:00
Erik Johnston
a76886726b
Merge pull request #2098 from matrix-org/erikj/repl_tcp_fix
...
Advance replication streams even if nothing is listening
2017-04-04 15:40:51 +01:00
Erik Johnston
4264ceb31c
Fiddle tcp replication logging
2017-04-04 14:14:03 +01:00
Erik Johnston
023ee197be
Advance replication streams even if nothing is listening
...
Otherwise the streams don't advance and steadily fall behind, so when a
worker does connect either a) they'll be streamed lots of old updates or
b) the connection will fail as the streams are too far behind.
2017-04-04 13:19:26 +01:00
Erik Johnston
52bfa604e1
Add basic replication client handler and factory
2017-04-03 15:34:13 +01:00
Erik Johnston
0a6a966e2b
Always advance stream tokens
2017-04-03 15:22:56 +01:00
Erik Johnston
1df7c28661
Use callbacks to notify tcp replication rather than deferreds
2017-03-31 15:42:51 +01:00
Erik Johnston
36d2b66f90
Add a timestamp to USER_SYNC command
...
This timestamp is used to indicate when the user last sync'd
2017-03-31 15:42:22 +01:00
Erik Johnston
bfcf016714
Fix up docs
2017-03-31 11:19:24 +01:00
Erik Johnston
4d7fc7f977
Add server side resource for tcp replication
2017-03-30 13:24:45 +01:00
Erik Johnston
7450693435
Initial TCP protocol implementation
...
This defines the low level TCP replication protocol
2017-03-30 12:54:46 +01:00
Erik Johnston
8da6f0be48
Define the various streams we will replicate
2017-03-30 12:54:46 +01:00