draft
parent
9eb69d110b
commit
45609a405d
|
@ -2,11 +2,14 @@
|
|||
|
||||
The actual documentation for setting up workers is not really easy to follow :
|
||||
|
||||
https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
|
||||
https://github.com/matrix-org/synapse/blob/master/docs/workers.md
|
||||
* https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
|
||||
* https://github.com/matrix-org/synapse/blob/master/docs/workers.md
|
||||
|
||||
This is how I (try) to setup the workers.
|
||||
**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange places**
|
||||
This is how I (try) to change my setup for using workers.
|
||||
**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange ways**
|
||||
**Look at the [issues](#Issues) below first**
|
||||
|
||||
I expect you have already a working synapse configuration. Not putting whole config files here
|
||||
|
||||
# Background
|
||||
|
||||
|
@ -14,10 +17,407 @@ This is how I (try) to setup the workers.
|
|||
* Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
|
||||
* DB is 14GB big
|
||||
* nginx is used as a reverse proxy
|
||||
* Synapse homeserver process was hammering with 100-120%CPU all day long
|
||||
* Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
|
||||
* my nginx graph gives an average of 140 requests/s in working hours
|
||||
* I'm using the debian packages of matrix.org and starting matrix with systemd
|
||||
|
||||
# Which workers are meaningful ?
|
||||
|
||||
First, I wanted to check what endpoints are asked the most in my installation :
|
||||
## analysing old logs
|
||||
|
||||
|
||||
First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours:
|
||||
### synapse.app.synchrotron
|
||||
`grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)' |wc -l`
|
||||
|
||||
### synapse.app.federation_reader
|
||||
|
||||
`grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'`
|
||||
|
||||
###synapse.app.media_repository
|
||||
|
||||
`grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'
|
||||
root@mort:/var/log/nginx# zgrep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'`
|
||||
|
||||
###synapse.app.client_reader
|
||||
|
||||
`grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'`
|
||||
|
||||
**Note** : I didn't included `/_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages`)
|
||||
175576 (without /messages)
|
||||
9998816 (with /messages not sure why)
|
||||
|
||||
### synapse.app.user_dir
|
||||
|
||||
`grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'`
|
||||
|
||||
### synapse.app.frontend_proxy
|
||||
`grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'`
|
||||
|
||||
### synapse.app.event_creator
|
||||
|
||||
`grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'`
|
||||
|
||||
## results
|
||||
|
||||
| worker’s endpoints | request/day | percent |
|
||||
| ------------------ | ----------- | ------- |
|
||||
| synchrotron | 9017921 | 90.19% |
|
||||
| federation_reader | 321413 | 3.21% |
|
||||
| media_repository | 115749 | 1.16% |
|
||||
| client_reader | 175576 | 1.76% |
|
||||
| user_dir | 1341 | 0.01% |
|
||||
| frontend_proxy | 6936 | 0.07% |
|
||||
| event_creator | 26876 | 0.27% |
|
||||
| total | 9665812 | 96.67% |
|
||||
| total requests | 9998816 | 100.00% |
|
||||
| others | 333004 | 3.33% |
|
||||
|
||||
So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)
|
||||
|
||||
# Setting up synchrotron worker(s)
|
||||
|
||||
**WARNING** : I broke parts of my setup a lot while trying to do it on a live server.
|
||||
|
||||
## homeserver.yaml
|
||||
Just add this in the existing listeners part of the config
|
||||
```
|
||||
listeners:
|
||||
# The TCP replication port
|
||||
- port: 9092
|
||||
bind_address: '127.0.0.1'
|
||||
type: replication
|
||||
# The HTTP replication port
|
||||
- port: 9093
|
||||
bind_address: '127.0.0.1'
|
||||
type: http
|
||||
resources:
|
||||
- names: [replication]
|
||||
```
|
||||
Also hat this to `homeserver.yaml`
|
||||
|
||||
worker_app: synapse.app.homeserver
|
||||
daemonize: false
|
||||
|
||||
|
||||
restart your synapse to check it's still working
|
||||
|
||||
`# systemctl restart matrix-synapse`
|
||||
|
||||
## workers configuration
|
||||
|
||||
**Note** : if you work as root, take care of giving the config files to matrix-synapse user after creating them
|
||||
|
||||
I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.
|
||||
|
||||
`mkdir /etc/matrix-synapse/workers`
|
||||
|
||||
### /etc/matrix-synapse/workers/synchrotron-1.yaml
|
||||
|
||||
```
|
||||
worker_app: synapse.app.synchrotron
|
||||
|
||||
# The replication listener on the synapse to talk to.
|
||||
worker_replication_host: 127.0.0.1
|
||||
worker_replication_port: 9092
|
||||
worker_replication_http_port: 9093
|
||||
|
||||
worker_listeners:
|
||||
- type: http
|
||||
port: 8083
|
||||
resources:
|
||||
- names:
|
||||
- client
|
||||
|
||||
worker_daemonize: False
|
||||
worker_pid_file: /var/run/synchrotron1.pid
|
||||
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
|
||||
send_federation: False
|
||||
```
|
||||
|
||||
If you want to run multiple synchrotron, create other config like this `sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml`
|
||||
|
||||
Don't forget to create log config files as weel for each worker.
|
||||
|
||||
### /etc/matrix-synapse/synchrotron1-log.yaml
|
||||
|
||||
This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log
|
||||
It may possibly be reduced...
|
||||
```
|
||||
version: 1
|
||||
|
||||
formatters:
|
||||
precise:
|
||||
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
|
||||
|
||||
filters:
|
||||
context:
|
||||
(): synapse.util.logcontext.LoggingContextFilter
|
||||
request: ""
|
||||
|
||||
handlers:
|
||||
file:
|
||||
class: logging.handlers.RotatingFileHandler
|
||||
formatter: precise
|
||||
filename: /var/log/matrix-synapse/synchrotron1.log
|
||||
maxBytes: 104857600
|
||||
backupCount: 10
|
||||
filters: [context]
|
||||
encoding: utf8
|
||||
level: DEBUG
|
||||
console:
|
||||
class: logging.StreamHandler
|
||||
formatter: precise
|
||||
level: WARN
|
||||
|
||||
loggers:
|
||||
synapse:
|
||||
level: WARN
|
||||
|
||||
synapse.storage.SQL:
|
||||
level: INFO
|
||||
|
||||
synapse.app.synchrotron:
|
||||
level: DEBUG
|
||||
root:
|
||||
level: WARN
|
||||
handlers: [file, console]
|
||||
```
|
||||
|
||||
## Starting the worker
|
||||
|
||||
I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly
|
||||
|
||||
## systemd
|
||||
|
||||
Followed this :
|
||||
https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
|
||||
|
||||
And created an extra systemd service to be able to have multiple synchrotrons.
|
||||
|
||||
### /etc/systemd/system/matrix-synapse-worker-synchrotron\@.service
|
||||
```
|
||||
[Unit]
|
||||
Description=Synapse Matrix Worker
|
||||
After=matrix-synapse.service
|
||||
BindsTo=matrix-synapse.service
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
NotifyAccess=main
|
||||
User=matrix-synapse
|
||||
WorkingDirectory=/var/lib/matrix-synapse
|
||||
EnvironmentFile=/etc/default/matrix-synapse
|
||||
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=always
|
||||
RestartSec=3
|
||||
SyslogIdentifier=matrix-synapse-synchrotron-%i
|
||||
|
||||
[Install]
|
||||
WantedBy=matrix-synapse.service
|
||||
```
|
||||
|
||||
* Reload the systemd config : `systemctl daemon-reload`
|
||||
* start synchrotron1 : `systemctl start matrix-synapse-worker-synchrotron@1.service`
|
||||
* check the logs : `journal -xe -f -u matrix-synapse-worker-synchrotron@1.service`
|
||||
|
||||
If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.
|
||||
|
||||
## Nginx config
|
||||
|
||||
### Some extras
|
||||
|
||||
add this to your default_server somewhere in `server { }`
|
||||
```
|
||||
location /nginx_status {
|
||||
stub_status on;
|
||||
access_log off;
|
||||
allow 127.0.0.1;
|
||||
allow ::1;
|
||||
deny all;
|
||||
}
|
||||
```
|
||||
you can then get some ideas of the requests you get with
|
||||
```
|
||||
$ curl http://127.0.0.1/nginx_status
|
||||
Active connections: 270
|
||||
server accepts handled requests
|
||||
172758 172758 3500311
|
||||
Reading: 0 Writing: 126 Waiting: 144
|
||||
```
|
||||
|
||||
### upstream synchrotrons
|
||||
|
||||
First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could *theoricaly* scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :
|
||||
|
||||
Place this in your nginx config (I put it in my vhost config outside of `server {}`)
|
||||
|
||||
```
|
||||
log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';
|
||||
|
||||
upstream synchrotron {
|
||||
# ip_hash; # this might help in some cases, not in mine
|
||||
# server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
|
||||
server 127.0.0.1:8083; # synchrotron1
|
||||
# server 127.0.0.1:8084; # synchrotron2
|
||||
# server 127.0.0.1:8085; # synchrotron3
|
||||
}
|
||||
```
|
||||
|
||||
Then, you can change the default log format of your vhost :
|
||||
|
||||
```
|
||||
server {
|
||||
#[...]
|
||||
access_log /var/log/nginx/matrix-access.log backend;
|
||||
#[...]
|
||||
}
|
||||
```
|
||||
|
||||
## reverse proxy the endpoints
|
||||
in my `server {}` section I set multiple locations (to avoid a very big regexp):
|
||||
|
||||
```
|
||||
location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
|
||||
proxy_pass http://synchrotron$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
|
||||
proxy_pass http://synchrotron$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
|
||||
proxy_pass http://synchrotron$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
|
||||
proxy_pass http://synchrotron$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
```
|
||||
|
||||
reload the nginx config, and your synchrotron worker should start to get traffic.
|
||||
|
||||
|
||||
## federation_reader
|
||||
|
||||
### workers/federation_reader.yaml
|
||||
|
||||
synapse.app.federation_reader listen on port 8011
|
||||
|
||||
```
|
||||
worker_app: synapse.app.federation_reader
|
||||
|
||||
worker_replication_host: 127.0.0.1
|
||||
worker_replication_port: 9092
|
||||
worker_replication_http_port: 9093
|
||||
|
||||
worker_listeners:
|
||||
- type: http
|
||||
port: 8011
|
||||
resources:
|
||||
- names: [federation]
|
||||
|
||||
|
||||
worker_pid_file: "/var/run/app.federation_reader.pid"
|
||||
worker_daemonize: False
|
||||
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
|
||||
send_federation: False
|
||||
|
||||
```
|
||||
|
||||
Here I separated the `^/_matrix/federation/v1/send/` endpoint, since it's documented that this cannot be multiple
|
||||
|
||||
```
|
||||
location ~ ^/_matrix/federation/v1/send/ {
|
||||
proxy_pass http://127.0.0.1:8011$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
# and a big regex for the rest
|
||||
location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
|
||||
proxy_pass http://127.0.0.1:8011$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
```
|
||||
|
||||
## other workers
|
||||
|
||||
I also tried `media_repository` and `event_creator`, but with it was not working as expected. For instance, the configs :
|
||||
|
||||
### event_creator.yaml
|
||||
```
|
||||
worker_app: synapse.app.event_creator
|
||||
|
||||
# The replication listener on the synapse to talk to.
|
||||
worker_replication_host: 127.0.0.1
|
||||
worker_replication_port: 9092
|
||||
worker_replication_http_port: 9093
|
||||
|
||||
worker_listeners:
|
||||
- type: http
|
||||
port: 8102
|
||||
resources:
|
||||
- names:
|
||||
- client
|
||||
|
||||
worker_daemonize: False
|
||||
worker_pid_file: /var/run/event_creator.pid
|
||||
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
|
||||
send_federation: False
|
||||
```
|
||||
### media_repository.yaml
|
||||
|
||||
```
|
||||
worker_app: synapse.app.media_repository
|
||||
|
||||
# The replication listener on the synapse to talk to.
|
||||
worker_replication_host: 127.0.0.1
|
||||
worker_replication_port: 9092
|
||||
worker_replication_http_port: 9093
|
||||
|
||||
worker_listeners:
|
||||
- type: http
|
||||
port: 8101
|
||||
resources:
|
||||
- names:
|
||||
- media
|
||||
|
||||
worker_daemonize: False
|
||||
worker_pid_file: /var/run/media_repository.pid
|
||||
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
|
||||
send_federation: False
|
||||
```
|
||||
|
||||
### in nginx
|
||||
|
||||
```
|
||||
# events_creator
|
||||
location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
|
||||
proxy_pass http://127.0.0.1:8102$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
|
||||
# media_repository : XXX Breaks thumbnails
|
||||
location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
|
||||
proxy_pass http://127.0.0.1:8101$uri;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
```
|
||||
|
||||
# Issues
|
||||
|
||||
This are the issues I met until now (it might also have been related to some big federated rooms):
|
||||
|
||||
* CPU usage was getting high on all synchrotron workers, more than with a single synapse process
|
||||
* a lot of clients were disconnecting all the time
|
||||
* some old notifications where popping up on desktop and mobile all the time
|
||||
* media_repository was breaking thumbnails
|
||||
* `send_federation: False` is needed in all workers configs except federation_sender (see #7130)
|
Loading…
Reference in New Issue