draft

airblag 2020-03-27 15:15:12 +01:00
parent 9eb69d110b
commit 45609a405d
1 changed files with 407 additions and 7 deletions

@ -2,11 +2,14 @@
The actual documentation for setting up workers is not really easy to follow :
https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
https://github.com/matrix-org/synapse/blob/master/docs/workers.md
* https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
* https://github.com/matrix-org/synapse/blob/master/docs/workers.md
This is how I (try) to setup the workers.
**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange places**
This is how I (try) to change my setup for using workers.
**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange ways**
**Look at the [issues](#Issues) below first**
I expect you have already a working synapse configuration. Not putting whole config files here
# Background
@ -14,10 +17,407 @@ This is how I (try) to setup the workers.
* Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
* DB is 14GB big
* nginx is used as a reverse proxy
* Synapse homeserver process was hammering with 100-120%CPU all day long
* Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
* my nginx graph gives an average of 140 requests/s in working hours
* I'm using the debian packages of matrix.org and starting matrix with systemd
# Which workers are meaningful ?
First, I wanted to check what endpoints are asked the most in my installation :
## analysing old logs
First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours:
### synapse.app.synchrotron
`grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)' |wc -l`
### synapse.app.federation_reader
`grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'`
###synapse.app.media_repository
`grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'
root@mort:/var/log/nginx# zgrep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'`
###synapse.app.client_reader
`grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'`
**Note** : I didn't included `/_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages`)
175576 (without /messages)
9998816 (with /messages not sure why)
### synapse.app.user_dir
`grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'`
### synapse.app.frontend_proxy
`grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'`
### synapse.app.event_creator
`grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'`
## results
| workers endpoints | request/day | percent |
| ------------------ | ----------- | ------- |
| synchrotron | 9017921 | 90.19% |
| federation_reader | 321413 | 3.21% |
| media_repository | 115749 | 1.16% |
| client_reader | 175576 | 1.76% |
| user_dir | 1341 | 0.01% |
| frontend_proxy | 6936 | 0.07% |
| event_creator | 26876 | 0.27% |
| total | 9665812 | 96.67% |
| total requests | 9998816 | 100.00% |
| others | 333004 | 3.33% |
So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)
# Setting up synchrotron worker(s)
**WARNING** : I broke parts of my setup a lot while trying to do it on a live server.
## homeserver.yaml
Just add this in the existing listeners part of the config
```
listeners:
# The TCP replication port
- port: 9092
bind_address: '127.0.0.1'
type: replication
# The HTTP replication port
- port: 9093
bind_address: '127.0.0.1'
type: http
resources:
- names: [replication]
```
Also hat this to `homeserver.yaml`
worker_app: synapse.app.homeserver
daemonize: false
restart your synapse to check it's still working
`# systemctl restart matrix-synapse`
## workers configuration
**Note** : if you work as root, take care of giving the config files to matrix-synapse user after creating them
I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.
`mkdir /etc/matrix-synapse/workers`
### /etc/matrix-synapse/workers/synchrotron-1.yaml
```
worker_app: synapse.app.synchrotron
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8083
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/synchrotron1.pid
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
send_federation: False
```
If you want to run multiple synchrotron, create other config like this `sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml`
Don't forget to create log config files as weel for each worker.
### /etc/matrix-synapse/synchrotron1-log.yaml
This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log
It may possibly be reduced...
```
version: 1
formatters:
precise:
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
filters:
context:
(): synapse.util.logcontext.LoggingContextFilter
request: ""
handlers:
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/matrix-synapse/synchrotron1.log
maxBytes: 104857600
backupCount: 10
filters: [context]
encoding: utf8
level: DEBUG
console:
class: logging.StreamHandler
formatter: precise
level: WARN
loggers:
synapse:
level: WARN
synapse.storage.SQL:
level: INFO
synapse.app.synchrotron:
level: DEBUG
root:
level: WARN
handlers: [file, console]
```
## Starting the worker
I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly
## systemd
Followed this :
https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers
And created an extra systemd service to be able to have multiple synchrotrons.
### /etc/systemd/system/matrix-synapse-worker-synchrotron\@.service
```
[Unit]
Description=Synapse Matrix Worker
After=matrix-synapse.service
BindsTo=matrix-synapse.service
[Service]
Type=notify
NotifyAccess=main
User=matrix-synapse
WorkingDirectory=/var/lib/matrix-synapse
EnvironmentFile=/etc/default/matrix-synapse
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=matrix-synapse-synchrotron-%i
[Install]
WantedBy=matrix-synapse.service
```
* Reload the systemd config : `systemctl daemon-reload`
* start synchrotron1 : `systemctl start matrix-synapse-worker-synchrotron@1.service`
* check the logs : `journal -xe -f -u matrix-synapse-worker-synchrotron@1.service`
If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.
## Nginx config
### Some extras
add this to your default_server somewhere in `server { }`
```
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow ::1;
deny all;
}
```
you can then get some ideas of the requests you get with
```
$ curl http://127.0.0.1/nginx_status
Active connections: 270
server accepts handled requests
172758 172758 3500311
Reading: 0 Writing: 126 Waiting: 144
```
### upstream synchrotrons
First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could *theoricaly* scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :
Place this in your nginx config (I put it in my vhost config outside of `server {}`)
```
log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';
upstream synchrotron {
# ip_hash; # this might help in some cases, not in mine
# server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
server 127.0.0.1:8083; # synchrotron1
# server 127.0.0.1:8084; # synchrotron2
# server 127.0.0.1:8085; # synchrotron3
}
```
Then, you can change the default log format of your vhost :
```
server {
#[...]
access_log /var/log/nginx/matrix-access.log backend;
#[...]
}
```
## reverse proxy the endpoints
in my `server {}` section I set multiple locations (to avoid a very big regexp):
```
location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
proxy_pass http://synchrotron$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
```
reload the nginx config, and your synchrotron worker should start to get traffic.
## federation_reader
### workers/federation_reader.yaml
synapse.app.federation_reader listen on port 8011
```
worker_app: synapse.app.federation_reader
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8011
resources:
- names: [federation]
worker_pid_file: "/var/run/app.federation_reader.pid"
worker_daemonize: False
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
send_federation: False
```
Here I separated the `^/_matrix/federation/v1/send/` endpoint, since it's documented that this cannot be multiple
```
location ~ ^/_matrix/federation/v1/send/ {
proxy_pass http://127.0.0.1:8011$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
# and a big regex for the rest
location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
proxy_pass http://127.0.0.1:8011$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
```
## other workers
I also tried `media_repository` and `event_creator`, but with it was not working as expected. For instance, the configs :
### event_creator.yaml
```
worker_app: synapse.app.event_creator
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8102
resources:
- names:
- client
worker_daemonize: False
worker_pid_file: /var/run/event_creator.pid
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
send_federation: False
```
### media_repository.yaml
```
worker_app: synapse.app.media_repository
# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8101
resources:
- names:
- media
worker_daemonize: False
worker_pid_file: /var/run/media_repository.pid
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
send_federation: False
```
### in nginx
```
# events_creator
location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
proxy_pass http://127.0.0.1:8102$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
# media_repository : XXX Breaks thumbnails
location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
proxy_pass http://127.0.0.1:8101$uri;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
}
```
# Issues
This are the issues I met until now (it might also have been related to some big federated rooms):
* CPU usage was getting high on all synchrotron workers, more than with a single synapse process
* a lot of clients were disconnecting all the time
* some old notifications where popping up on desktop and mobile all the time
* media_repository was breaking thumbnails
* `send_federation: False` is needed in all workers configs except federation_sender (see #7130)