matrix-public-archive/server
Eric Eastwood f6bd581f77
Better `child_process` error handling v2 - timeouts and actually fail process for error in scope (#62)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/51

Better `child_process` error handling for a couple scenarios with the finger pointing at it 👉

Also make sure we handle all of these scenarios:

 1. Child process fork script throws an `uncaughtException` or `unhandledRejection`
    - These are captured and serialized back to the parent and stored in `childErrors` and exposed if we never get a successful rendered HTML response.
 2. Child process fails to startup 
    - Render process is rejected in the `child.on('error', ...` callback
 3. 👉 Child process times out and is aborted
    - Render process is rejected in the `child.on('error', ...` callback and any `childErrors` encountered are logged
 4. 👉 Child process fork script throws an error in scope of in `process.on('message', async (renderOptions) => {`
    - Child exits with code 1 and we reject the render process with the error
 5. Child process exits with code 1 (error)
    - Render process is rejected with any `childError` info
 6. Child process exits with code 0 (success) but never sends back any HTML
    - We have a `returnedData` data check and any child errors encountered are logged
2022-09-02 18:49:45 -05:00
..
hydrogen-render Better `child_process` error handling v2 - timeouts and actually fail process for error in scope (#62) 2022-09-02 18:49:45 -05:00
lib Better `child_process` error handling v2 - timeouts and actually fail process for error in scope (#62) 2022-09-02 18:49:45 -05:00
routes Make the archive responsive (#53) 2022-08-30 18:47:03 -05:00
tracing Manually instrument some archive logic (#44) 2022-08-29 14:13:13 -05:00
README.md Add available Jaeger port (#48) 2022-08-29 14:08:15 -05:00
ensure-room-joined.js Add test for joining a new federated room (#31) 2022-08-29 18:56:31 -05:00
fetch-events-in-range.js Add test for joining a new federated room (#31) 2022-08-29 18:56:31 -05:00
fetch-room-data.js Manually instrument some archive logic (#44) 2022-08-29 14:13:13 -05:00
server.js Enable tracing by config so we can enable from argv, env variable, or config file (#41) 2022-07-14 11:26:53 -05:00
start-dev.js OpenTelemetry tracing so we can see spans where the app is taking time (#27) 2022-07-14 11:08:50 -05:00

README.md

Tracing

Run the app with the OpenTelemetry tracing.

npm run start -- --tracing
# or
npm run start-dev -- --tracing

Manually:

node --require './server/tracing.js' server/server.js

Traces are made up of many spans. Each span defines a traceId which it is associated with.

Viewing traces in Jaeger

via https://www.jaegertracing.io/docs/1.35/getting-started/

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 5775:5775/udp \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.35
Port Protocol Component Function
6831 UDP agent accept jaeger.thrift over Thrift-compact protocol (used by most SDKs)
6832 UDP agent accept jaeger.thrift over Thrift-binary protocol (used by Node.js SDK)
5775 UDP agent (deprecated) accept zipkin.thrift over compact Thrift protocol (used by legacy clients only)
5778 HTTP agent serve configs (sampling, etc.)
16686 HTTP query serve frontend
4317 HTTP collector accept OpenTelemetry Protocol (OTLP) over gRPC, if enabled
4318 HTTP collector accept OpenTelemetry Protocol (OTLP) over HTTP, if enabled
14268 HTTP collector accept jaeger.thrift directly from clients
14250 HTTP collector accept model.proto
9411 HTTP collector Zipkin compatible endpoint (optional)

With Service Performance Monitoring (SPM)