diff --git a/Workers-setup-with-nginx.md b/Workers-setup-with-nginx.md index fe4d5cb..341b0ef 100644 --- a/Workers-setup-with-nginx.md +++ b/Workers-setup-with-nginx.md @@ -2,11 +2,14 @@ The actual documentation for setting up workers is not really easy to follow : -https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers -https://github.com/matrix-org/synapse/blob/master/docs/workers.md +* https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers +* https://github.com/matrix-org/synapse/blob/master/docs/workers.md -This is how I (try) to setup the workers. -**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange places** +This is how I (try) to change my setup for using workers. +**WARNING : SHOULD BE REVIEWED ! WIP ! Actually this breaks my setup in a lot of strange ways** +**Look at the [issues](#Issues) below first** + +I expect you have already a working synapse configuration. Not putting whole config files here # Background @@ -14,10 +17,407 @@ This is how I (try) to setup the workers. * Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL). * DB is 14GB big * nginx is used as a reverse proxy -* Synapse homeserver process was hammering with 100-120%CPU all day long +* Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs. +* my nginx graph gives an average of 140 requests/s in working hours +* I'm using the debian packages of matrix.org and starting matrix with systemd # Which workers are meaningful ? -First, I wanted to check what endpoints are asked the most in my installation : +## analysing old logs - +First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours: +### synapse.app.synchrotron +`grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)' |wc -l` + +### synapse.app.federation_reader + +`grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'` + +###synapse.app.media_repository + +`grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)' +root@mort:/var/log/nginx# zgrep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'` + +###synapse.app.client_reader + +`grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'` + +**Note** : I didn't included `/_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages`) +175576 (without /messages) +9998816 (with /messages not sure why) + +### synapse.app.user_dir + +`grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'` + +### synapse.app.frontend_proxy +`grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'` + +### synapse.app.event_creator + +`grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'` + +## results + +| worker’s endpoints | request/day | percent | +| ------------------ | ----------- | ------- | +| synchrotron | 9017921 | 90.19% | +| federation_reader | 321413 | 3.21% | +| media_repository | 115749 | 1.16% | +| client_reader | 175576 | 1.76% | +| user_dir | 1341 | 0.01% | +| frontend_proxy | 6936 | 0.07% | +| event_creator | 26876 | 0.27% | +| total | 9665812 | 96.67% | +| total requests | 9998816 | 100.00% | +| others | 333004 | 3.33% | + +So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this) + +# Setting up synchrotron worker(s) + +**WARNING** : I broke parts of my setup a lot while trying to do it on a live server. + +## homeserver.yaml +Just add this in the existing listeners part of the config +``` +listeners: + # The TCP replication port + - port: 9092 + bind_address: '127.0.0.1' + type: replication + # The HTTP replication port + - port: 9093 + bind_address: '127.0.0.1' + type: http + resources: + - names: [replication] +``` +Also hat this to `homeserver.yaml` + +worker_app: synapse.app.homeserver +daemonize: false + + +restart your synapse to check it's still working + +`# systemctl restart matrix-synapse` + +## workers configuration + +**Note** : if you work as root, take care of giving the config files to matrix-synapse user after creating them + +I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers. + +`mkdir /etc/matrix-synapse/workers` + +### /etc/matrix-synapse/workers/synchrotron-1.yaml + +``` +worker_app: synapse.app.synchrotron + +# The replication listener on the synapse to talk to. +worker_replication_host: 127.0.0.1 +worker_replication_port: 9092 +worker_replication_http_port: 9093 + +worker_listeners: + - type: http + port: 8083 + resources: + - names: + - client + +worker_daemonize: False +worker_pid_file: /var/run/synchrotron1.pid +worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml +send_federation: False +``` + +If you want to run multiple synchrotron, create other config like this `sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml` + +Don't forget to create log config files as weel for each worker. + +### /etc/matrix-synapse/synchrotron1-log.yaml + +This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log +It may possibly be reduced... +``` +version: 1 + +formatters: + precise: + format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s' + +filters: + context: + (): synapse.util.logcontext.LoggingContextFilter + request: "" + +handlers: + file: + class: logging.handlers.RotatingFileHandler + formatter: precise + filename: /var/log/matrix-synapse/synchrotron1.log + maxBytes: 104857600 + backupCount: 10 + filters: [context] + encoding: utf8 + level: DEBUG + console: + class: logging.StreamHandler + formatter: precise + level: WARN + +loggers: + synapse: + level: WARN + + synapse.storage.SQL: + level: INFO + + synapse.app.synchrotron: + level: DEBUG +root: + level: WARN + handlers: [file, console] +``` + +## Starting the worker + +I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly + +## systemd + +Followed this : +https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers + +And created an extra systemd service to be able to have multiple synchrotrons. + +### /etc/systemd/system/matrix-synapse-worker-synchrotron\@.service +``` +[Unit] +Description=Synapse Matrix Worker +After=matrix-synapse.service +BindsTo=matrix-synapse.service + +[Service] +Type=notify +NotifyAccess=main +User=matrix-synapse +WorkingDirectory=/var/lib/matrix-synapse +EnvironmentFile=/etc/default/matrix-synapse +ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml +ExecReload=/bin/kill -HUP $MAINPID +Restart=always +RestartSec=3 +SyslogIdentifier=matrix-synapse-synchrotron-%i + +[Install] +WantedBy=matrix-synapse.service +``` + +* Reload the systemd config : `systemctl daemon-reload` +* start synchrotron1 : `systemctl start matrix-synapse-worker-synchrotron@1.service` +* check the logs : `journal -xe -f -u matrix-synapse-worker-synchrotron@1.service` + +If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet. + +## Nginx config + +### Some extras + +add this to your default_server somewhere in `server { }` +``` + location /nginx_status { + stub_status on; + access_log off; + allow 127.0.0.1; + allow ::1; + deny all; + } +``` +you can then get some ideas of the requests you get with +``` +$ curl http://127.0.0.1/nginx_status +Active connections: 270 +server accepts handled requests + 172758 172758 3500311 +Reading: 0 Writing: 126 Waiting: 144 +``` + +### upstream synchrotrons + +First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could *theoricaly* scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) : + +Place this in your nginx config (I put it in my vhost config outside of `server {}`) + +``` +log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time'; + +upstream synchrotron { +# ip_hash; # this might help in some cases, not in mine +# server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely) + server 127.0.0.1:8083; # synchrotron1 +# server 127.0.0.1:8084; # synchrotron2 +# server 127.0.0.1:8085; # synchrotron3 +} +``` + +Then, you can change the default log format of your vhost : + +``` +server { +#[...] + access_log /var/log/nginx/matrix-access.log backend; +#[...] +} +``` + +## reverse proxy the endpoints +in my `server {}` section I set multiple locations (to avoid a very big regexp): + +``` + location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ { + proxy_pass http://synchrotron$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } + location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ { + proxy_pass http://synchrotron$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } + location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ { + proxy_pass http://synchrotron$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } + location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ { + proxy_pass http://synchrotron$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } +``` + +reload the nginx config, and your synchrotron worker should start to get traffic. + + +## federation_reader + +### workers/federation_reader.yaml + +synapse.app.federation_reader listen on port 8011 + +``` +worker_app: synapse.app.federation_reader + +worker_replication_host: 127.0.0.1 +worker_replication_port: 9092 +worker_replication_http_port: 9093 + +worker_listeners: + - type: http + port: 8011 + resources: + - names: [federation] + + +worker_pid_file: "/var/run/app.federation_reader.pid" +worker_daemonize: False +worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml +send_federation: False + +``` + +Here I separated the `^/_matrix/federation/v1/send/` endpoint, since it's documented that this cannot be multiple + +``` + location ~ ^/_matrix/federation/v1/send/ { + proxy_pass http://127.0.0.1:8011$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } +# and a big regex for the rest + location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) { + proxy_pass http://127.0.0.1:8011$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } +``` + +## other workers + +I also tried `media_repository` and `event_creator`, but with it was not working as expected. For instance, the configs : + +### event_creator.yaml +``` +worker_app: synapse.app.event_creator + +# The replication listener on the synapse to talk to. +worker_replication_host: 127.0.0.1 +worker_replication_port: 9092 +worker_replication_http_port: 9093 + +worker_listeners: + - type: http + port: 8102 + resources: + - names: + - client + +worker_daemonize: False +worker_pid_file: /var/run/event_creator.pid +worker_log_config: /etc/matrix-synapse/event_creator-log.yaml +send_federation: False +``` +### media_repository.yaml + +``` +worker_app: synapse.app.media_repository + +# The replication listener on the synapse to talk to. +worker_replication_host: 127.0.0.1 +worker_replication_port: 9092 +worker_replication_http_port: 9093 + +worker_listeners: + - type: http + port: 8101 + resources: + - names: + - media + +worker_daemonize: False +worker_pid_file: /var/run/media_repository.pid +worker_log_config: /etc/matrix-synapse/media_repository-log.yaml +send_federation: False +``` + +### in nginx + +``` +# events_creator + location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) { + proxy_pass http://127.0.0.1:8102$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } + +# media_repository : XXX Breaks thumbnails + location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) { + proxy_pass http://127.0.0.1:8101$uri; + proxy_set_header X-Forwarded-For $remote_addr; + proxy_set_header Host $host; + } +``` + +# Issues + +This are the issues I met until now (it might also have been related to some big federated rooms): + +* CPU usage was getting high on all synchrotron workers, more than with a single synapse process +* a lot of clients were disconnecting all the time +* some old notifications where popping up on desktop and mobile all the time +* media_repository was breaking thumbnails +* `send_federation: False` is needed in all workers configs except federation_sender (see #7130) \ No newline at end of file