matrix-public-archive/server
Eric Eastwood ddfe94beab
OpenTelemetry tracing so we can see spans where the app is taking time (#27)
OpenTelemetry tracing so we can see spans where the app is taking time.
For the user, we specifically show the spans for the external API HTTP requests
that are slow (so we know when the Matrix API is being slow).

Enable tracing:

 - `npm run start -- --tracing`
 - `npm run start-dev -- --tracing`

What does this PR change:

 - Adds OpenTelemetry tracing with some of the automatic instrumentation (includes HTTP and express)
    - We ignore traces for serving static assets (just noise)
 - Adds `X-Trace-Id` to the response headers
 - Adds `window.tracingSpansForRequest` which includes the external HTTP API requests made during the request
 - Adds a fancy 504 timeout page that includes trace details and lists the slow HTTP requests
 - Adds `jaegerTracesEndpoint` configuration to export tracing spans to Jaeger
 - Related to, https://github.com/matrix-org/matrix-public-archive/issues/26
2022-07-14 11:08:50 -05:00
..
hydrogen-render Make sure we finish sending the HTML payload before we exit the process (#38) 2022-07-06 19:24:29 -05:00
lib OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00
routes OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00
tracing OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00
README.md OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00
fetch-events-in-range.js Remove unneeded `include_redundant_members` from `/messages` `filter` and test that member state is still visible (#29) 2022-06-29 06:56:13 -05:00
fetch-room-data.js E2E test but still failing because fetching from start of day before test events happened 2022-02-23 21:25:05 -06:00
server.js OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00
start-dev.js OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00

README.md

Tracing

Run the app with the OpenTelemetry tracing.

npm run start -- --tracing
# or
npm run start-dev -- --tracing

Manually:

node --require './server/tracing.js' server/server.js

Traces are made up of many spans. Each span defines a traceId which it is associated with.

Viewing traces in Jaeger

via https://www.jaegertracing.io/docs/1.35/getting-started/

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.35
Port Protocol Component Function
6831 UDP agent accept jaeger.thrift over Thrift-compact protocol (used by most SDKs)
6832 UDP agent accept jaeger.thrift over Thrift-binary protocol (used by Node.js SDK)
5775 UDP agent (deprecated) accept zipkin.thrift over compact Thrift protocol (used by legacy clients only)
5778 HTTP agent serve configs (sampling, etc.)
16686 HTTP query serve frontend
4317 HTTP collector accept OpenTelemetry Protocol (OTLP) over gRPC, if enabled
4318 HTTP collector accept OpenTelemetry Protocol (OTLP) over HTTP, if enabled
14268 HTTP collector accept jaeger.thrift directly from clients
14250 HTTP collector accept model.proto
9411 HTTP collector Zipkin compatible endpoint (optional)

With Service Performance Monitoring (SPM)