Commit Graph

44 Commits

Author SHA1 Message Date
Eric Eastwood 1dd63212c0
Add reason why the archive bot is joining the room (#262)
Using the join `reason` added in [MSC2367](https://github.com/matrix-org/matrix-spec-proposals/pull/2367). Unfortunately, this PR doesn't have much effect because it doesn't look like many clients support it yet (Element doesn't support it for example).

Part of https://github.com/matrix-org/matrix-public-archive/issues/257
2023-06-09 16:05:20 -05:00
Eric Eastwood 16323df054
Add image metadata for URL previews (#224)
- Default to a nice `[matrix]` banner
    -  There is room for improvement here when the Matrix Public Archive gets it's own logo (https://github.com/matrix-org/matrix-public-archive/issues/94) and maybe says "Matrix Public Archive" somewhere in the banner.
    - This is good enough for now (and certainly better than downstream previews using the first image on the page).
 - For rooms, it will use the room avatar

Part of https://github.com/matrix-org/matrix-public-archive/issues/202

Image is sized to 1200x630 to match conventions of `og:image`.

Crafted the banner image by modifying the header on the room directory homepage and taking a node screenshot. Page zoom @ 175%
2023-05-10 00:50:12 -05:00
Eric Eastwood 198e8c09be
Mark NSFW room pages with `<meta name="rating" content="adult">` (#216)
Related docs:

 - https://developers.google.com/search/docs/crawling-indexing/safesearch
 - https://developers.google.com/search/docs/crawling-indexing/special-tags
2023-05-05 15:36:26 -05:00
Eric Eastwood 9b067f8637
Set `X-Date-Temporal-Context` header for easy cache rules (#209)
Set `X-Date-Temporal-Context: [past|present|future]` header for easy cache rules:

 - Cache `past` things heavily
 - Cache `present`/`future` things for 5 minutes
 
This accomplishes the goal we set out for:

> - We can cache all responses except for the latest UTC day (and anything in the future). ex. `/!aMzLHLvScQCGKDNqCB:gitter.im/date/2022/10/13`
>    - For the latest day, we could set the cache expire after 5 minutes or so
>
> *-- [Matrix Public Archive deployment issue](https://github.com/vector-im/sre-internal/issues/2079)*

And this way we don't have to do any fancy date parsing and comparison from the URL which is probably not even possible Cloudflare cache rules.
2023-05-04 13:42:59 -05:00
Eric Eastwood 9078abf4f1
Timeout requests and stop processing further (#204)
Fix https://github.com/matrix-org/matrix-public-archive/issues/148
Fix https://github.com/matrix-org/matrix-public-archive/issues/40

 - Apply timeout middleware to all room directory and room routes
 - Stop messing with the response after we timeout. Fix https://github.com/matrix-org/matrix-public-archive/issues/148
    - This also involves cancelling any `async/await` things like requests in the routes so we throw an abort error instead of continuing on. Fix https://github.com/matrix-org/matrix-public-archive/issues/40
 - Also abort the route if we see that the user closed the request before we could respond to them
 - Bumps minimum supported Node.js version to v18 because we're now using the built-in native `fetch` in Node.js vs `node-fetch`. This gives us the custom `signal.reason` that we aborted with instead of a generic `AbortError`.
    - This also means we had to add some instrumentation for `fetch` which uses `undici` under the hood. Settled on some unofficial instrumentation: [`opentelemetry-instrumentation-fetch-node`](https://www.npmjs.com/package/opentelemetry-instrumentation-fetch-node)
2023-05-02 00:39:01 -05:00
Eric Eastwood f71fc2bb9c
Cache derived info from the `manifest.json` (#191)
- Like getting all of the dependencies for a given entry point
 - And the favicons
 
Also fix the problem where `server/hydrogen-render/render-page-html.js` was calling `getFaviconAssetUrls()` right away before the client build had a chance to generate `dist/manifest.json` and result in `Error: Cannot find module '../../dist/manifest.json'`
2023-04-26 17:04:49 -05:00
Eric Eastwood e20a67d2ba
Preload fonts and images (#187)
Part of https://github.com/matrix-org/matrix-public-archive/issues/132
2023-04-26 16:35:00 -05:00
Eric Eastwood 27863a1945
Iterate on `crossorigin` language in `Link` preload header comments (#186)
Hopefully more accurate now 🤞
2023-04-26 04:05:11 -05:00
Eric Eastwood a3952f1d31
Fix preload link headers (#185)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/171 and https://github.com/matrix-org/matrix-public-archive/pull/175 where they broke because we went from scripts to modules.

Part of https://github.com/matrix-org/matrix-public-archive/issues/132

Before this PR, we were seeing these warning in the Chrome devtools console:

```
A preload for 'foo' is found, but is not used because the request credentials mode does not match. Consider taking a look at crossorigin attribute.
```

This is caused by a credentials mode mismatch between the `<script type="module">` tag and the `Link` header. A `<script type="module">` with no `crossorigin` attribute indicates a credentials mode of `omit` and a naive `Link: </foo-url>; rel=preload; as=script;` has a  default credentials mode of `same-origin`, hence the mismatch and warning we're seeing.

We could set the credentials mode to match using `Link: </foo-url>; rel=preload; as=script; omit` but there is an even better option! We can use the dedicated `Link: </foo-url>; rel=modulepreload` link type which not only downloads and puts the the file in the cache like a normal preload but the browser also knows it's a JavaScript module now and can parse/compile it so it's ready to go.

---

Future consideration: Adding `nopush` to preload link headers. Many servers initiate an HTTP/2 Server Push when they encounter a preload link in HTTP header form otherwise. Do we want/care about that (or maybe we don't)? (mentioned in https://medium.com/reloading/preload-prefetch-and-priorities-in-chrome-776165961bbf#6f54)

---

References for preload `Link` headers:

  - https://medium.com/reloading/preload-prefetch-and-priorities-in-chrome-776165961bbf#6f54
  - https://html.spec.whatwg.org/multipage/links.html#link-type-preload
  - https://www.smashingmagazine.com/2016/02/preload-what-is-it-good-for/#headers
 - https://developer.chrome.com/blog/modulepreload/#ok-so-why-doesnt-link-relpreload-work-for-modules
2023-04-26 03:29:57 -05:00
Eric Eastwood 2c12fec1e6
Fix scripts not loading from the production ready build PR (#183)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/175
2023-04-25 03:54:49 -05:00
Eric Eastwood 0f26dc94d3
Migrate from `eslint-plugin-node` to `eslint-plugin-n` (#179) 2023-04-25 00:39:59 -05:00
Eric Eastwood 9c0b6fe85e
Production ready build (#175)
- Rename `public` -> `client` so it doesn't get copied automagically as-is (without hashes which we want for cache busting), https://vitejs.dev/guide/assets.html#the-public-directory
     - We still build the version files to `public/` so their copied as-is and Vite handles it for us (so we can use `emptyOutDir`) 
 - Use a multiple entrypoint `.js` Vite build so things can be more intelligently bundled and take less time
     - We aren't using library mode because it doesn't minify or bundle assets
 - Using hash asset tags for cache busting. Hash of the file included in the file name
 - We lookup these hashed assets from `manifest.json` that Vite builds (https://vitejs.dev/guide/backend-integration.html) to serve and preload
 - In terms of optimized bundles, I know the current output isn't great now but will have to opt to fix that up separately in the future. Tracked by https://github.com/matrix-org/matrix-public-archive/issues/176
2023-04-24 23:50:53 -05:00
Eric Eastwood 50a1d658e8
Only read version tag files once on startup (#174)
We already read it once for the `/health-check` endpoint and cached the response but this way we can use `getVersionTags()` everywhere without worrying about it.

Also, it's no longer `async` so we can use it in things like Express route paths and CDN asset tags more easily.
2023-04-19 15:57:22 -05:00
Eric Eastwood 17a39ab8db
Add preload link headers for downstream Cloudflare early hints (#171)
Because it takes us at best several seconds to request information from a homeserver and then server-side render the page, the browser has to wait for the response before it can even try loading the necessary assets. With this change that facilitates early hints, the browser can preload all of the assets necessary before we are done generating the response and will be ready to go by the time we're all done on the server.

Fix https://github.com/matrix-org/matrix-public-archive/issues/32

Part of https://github.com/matrix-org/matrix-public-archive/issues/132

See https://developers.cloudflare.com/cache/about/early-hints/ for information on enabling in Cloudflare
2023-04-19 14:20:01 -05:00
Eric Eastwood 551b4e72d1
Follow tombstone and predecessor history (#167)
Fix https://github.com/matrix-org/matrix-public-archive/issues/59

Other updates:

 - Update tests to use `/roomid/room1/date/2022/01/03` format instead of trying to retrofit the weird alias stuff on there. Which also makes the fancy to actual URL utilities much more simple.
 - Update to specify `archiveMessageLimit` in the test case because pages have different number of events depending on if we are against a boundary, hidden events, etc.
2023-04-19 01:26:15 -05:00
Eric Eastwood 6c789eae69
Do our best to get the user to the right place and try joining `via` derived server name (#168)
Split out from https://github.com/matrix-org/matrix-public-archive/pull/167
2023-04-11 15:09:44 -05:00
Eric Eastwood 954b22995a
Add a way to select time of day (#139)
- Fix https://github.com/matrix-org/matrix-public-archive/issues/7
 - A URL with time looks like
    - `/r/too-many-messages-on-day:my.synapse.server/date/2022/11/16T23:59`
    - Or when more precision is required (seconds): `/r/too-many-messages-on-day:my.synapse.server/date/2022/11/16T23:59:59`
 - Add new custom time picker/scrubber (pictured below) with momentum scrubbing
    - Native built-in `<input type="time">` for easier picking if you prefer that and accessibility.
    - Uses localized time strings
    - Design inspired by Thiago Sanchez's *Time Zone Translate* concept, https://dribbble.com/shots/14590546-Time-Zone-Translate
2023-04-05 04:25:31 -05:00
Philip Durbin 8f9e1631ae
Switch /timestamp_to_event from unstable to stable v1 #142 (#154) 2023-02-16 20:52:28 -06:00
Eric Eastwood 2dff7ecea5
Refactor `fetchEndpointAsText`/`fetchEndpointAsJson` to return `res` alongside `data` (#122)
Split out of https://github.com/matrix-org/matrix-public-archive/pull/121
where we needed to use `res.url`.
2022-11-03 04:12:00 -05:00
Eric Eastwood 08254cbb49
Add a way to jump forwards and backwards to more activity in the room (seamless navigation) (#114)
Fix https://github.com/matrix-org/matrix-public-archive/issues/46
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/71

Summary:

 - Changes the "Jump to next activity in room" to actually continue you to the next 100 messages ahead. Previously, it only jumped you to the single next event in the room which meant a lot of backwards overlap each time.
    - Jumping this direction will also start your scroll position at the top of the timeline to continue reading seamlessly `?continue=top`
 - Adds "Jump to previous activity in room" to the top of the timeline to continue reading the previous part of the conversation.

[1]: There is a caveat with seamless here which is also commented on in the code:

> XXX: This is flawed in the fact that when we go `/messages?dir=b` it could  backfill messages which will fill up the response before we perfectly connect and  continue from the position they were jumping from before. When `/messages?dir=f`  backfills, we won't have this problem anymore because any messages backfilled in  the forwards direction would be picked up the same going backwards.

(need forwards fill MSC)
2022-11-02 04:27:30 -05:00
Eric Eastwood 7a88ea0c19
Add support for room aliases (#107)
Also does friendly redirects if you don't exactly use the right URL pattern.
For example, if you paste the full room ID with the `!` like `/roomid/!foo:bar`,
it will properly redirect you to `/roomid/foo:bar`. It also does this sort of
thing for URL encoded room ID's and aliases.

Fix https://github.com/matrix-org/matrix-public-archive/issues/25
2022-10-27 01:09:13 -05:00
Eric Eastwood b34c1b817d
Add homeserver selector to room directory landing page (#87)
Opting for the simple solution and using `include_all_networks` instead of needing to fetch the information about the third-party networks.

Fix https://github.com/matrix-org/matrix-public-archive/issues/6 (last piece done with this PR)
2022-10-20 02:06:43 -05:00
Eric Eastwood be837515fe
Show surrounding messages for a full screen of content (#71)
1. Add surrounding messages to the given messages so we have a full screen of content to make it feel lively even in quiet rooms
    - As you scroll around the timeline across different days, the date changes in the URL, calendar, etc
 2. Add summary item to the bottom of the timeline that explains if we couldn't find any messages in the specific day requested 
    - Also allows you to the jump to the next activity in the room. Adds `/:roomId/jump?ts=xxx&dir=[f|b]` to facilitate this.
    - Part of https://github.com/matrix-org/matrix-public-archive/issues/46
 3. Add developer options modal which is linked from the bottom of the right-panel
    - Adds an option so you can debug the `IntersectionObserver` and how it's selecting the active day from the top-edge of the scroll viewport.
    - In the future, this will also include a nice little visualization of the backend timing traces
2022-09-20 16:02:09 -05:00
Eric Eastwood 92668996d7
Add search to room directory landing page (#70)
Part of https://github.com/matrix-org/matrix-public-archive/issues/6
2022-09-15 20:41:55 -05:00
Eric Eastwood 32c77ecffe
Only show `world_readable` or `public` rooms in the archive. Only index `world_readable` (#66)
Only show `world_readable` or `public` rooms in the archive. Only allow `world_readable` rooms to be indexed by search engines.

Related to https://github.com/matrix-org/matrix-public-archive/issues/47
2022-09-08 19:15:07 -05:00
Eric Eastwood 65a371910a
Redirect to last day with message history (#65)
Redirect to last day with message history: `/:roomIdOrAlias` -> `/:roomIdOrAlias/date/:yyyy/:mm/:dd`

Fix https://github.com/matrix-org/matrix-public-archive/issues/60
2022-09-08 02:18:18 -05:00
Eric Eastwood 127d416e6a
Room directory landing page v1 (#61)
Part of https://github.com/matrix-org/matrix-public-archive/issues/6
2022-09-08 01:30:04 -05:00
Eric Eastwood f6bd581f77
Better `child_process` error handling v2 - timeouts and actually fail process for error in scope (#62)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/51

Better `child_process` error handling for a couple scenarios with the finger pointing at it 👉

Also make sure we handle all of these scenarios:

 1. Child process fork script throws an `uncaughtException` or `unhandledRejection`
    - These are captured and serialized back to the parent and stored in `childErrors` and exposed if we never get a successful rendered HTML response.
 2. Child process fails to startup 
    - Render process is rejected in the `child.on('error', ...` callback
 3. 👉 Child process times out and is aborted
    - Render process is rejected in the `child.on('error', ...` callback and any `childErrors` encountered are logged
 4. 👉 Child process fork script throws an error in scope of in `process.on('message', async (renderOptions) => {`
    - Child exits with code 1 and we reject the render process with the error
 5. Child process exits with code 1 (error)
    - Render process is rejected with any `childError` info
 6. Child process exits with code 0 (success) but never sends back any HTML
    - We have a `returnedData` data check and any child errors encountered are logged
2022-09-02 18:49:45 -05:00
Eric Eastwood b81df10c8e
Use JSON5 for configuration files with comments (#52)
Use JSON5 for configuration files with comments. Now we can leave the available config in `config.default.json` without having to add weird instructions to remove the `xxx`, etc

 - https://www.npmjs.com/package/json5
 - https://www.npmjs.com/package/nconf
 - https://github.com/indexzero/nconf/issues/113#issuecomment-69999413
2022-08-29 20:33:02 -05:00
Eric Eastwood bdaa98e722
Make the `child_process` error catching more robust (`uncaughtException`) (#51)
Split off from https://github.com/matrix-org/matrix-public-archive/pull/43

Listen to `process.on('uncaughtException', ...)` and handle the async errors ourselves so it no longer fails the child process.

And if the process does exit with status code 1 (error), we have those underlying errors serialized and shown.
2022-08-29 19:13:56 -05:00
Eric Eastwood ddfe94beab
OpenTelemetry tracing so we can see spans where the app is taking time (#27)
OpenTelemetry tracing so we can see spans where the app is taking time.
For the user, we specifically show the spans for the external API HTTP requests
that are slow (so we know when the Matrix API is being slow).

Enable tracing:

 - `npm run start -- --tracing`
 - `npm run start-dev -- --tracing`

What does this PR change:

 - Adds OpenTelemetry tracing with some of the automatic instrumentation (includes HTTP and express)
    - We ignore traces for serving static assets (just noise)
 - Adds `X-Trace-Id` to the response headers
 - Adds `window.tracingSpansForRequest` which includes the external HTTP API requests made during the request
 - Adds a fancy 504 timeout page that includes trace details and lists the slow HTTP requests
 - Adds `jaegerTracesEndpoint` configuration to export tracing spans to Jaeger
 - Related to, https://github.com/matrix-org/matrix-public-archive/issues/26
2022-07-14 11:08:50 -05:00
Eric Eastwood 9c8f980e2a Remove random log 2022-07-05 11:21:04 -05:00
Eric Eastwood 17f2c399dd
Expose arguments for `renderHydrogenToString` (pure function) so we can reproduce when error occurs while we render Hydrogen (#35)
`renderHydrogenToString` is a pure function (probably) which means it will give the same output given the same input. This means, that if we give it a certain input and an error occurs, we should be able to reproduce it again if we have the arguments. This PR exposes those arguments in the logged error so we can investigate what's going wrong.

Added so we can investigate https://github.com/matrix-org/matrix-public-archive/issues/34 better and reproduce locally.
2022-07-05 10:30:12 -05:00
Eric Eastwood bd5c14242e
Make sure container is able to start up (#23)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/22
2022-06-15 17:12:44 -05:00
Eric Eastwood 952a3acd4a
Explain how config inherits (#17) 2022-06-09 21:37:07 -05:00
Eric Eastwood 9fc71a3412
Remove `matrix-bot-sdk` usage in tests (#15)
Remove `matrix-bot-sdk` usage in tests because it didn't have timestamp massaging `?ts` and it's not really necessary to rely on since we can just call the API directly 🤷. `matrix-bot-sdk` is also very annoying having to build rust crypto packages.

We're now using direct `fetch` requests against the Matrix API and lightweight `client` object.

All 3 current tests pass 
2022-06-09 20:44:57 -05:00
Eric Eastwood 839e31a35e E2E test but still failing because fetching from start of day before test events happened 2022-02-23 21:25:05 -06:00
Eric Eastwood 0f493a241e Federated homeservers in Docker for e2e tests 2022-02-22 20:25:24 -06:00
Eric Eastwood 6a5d011a45 Link calendar days 2022-02-16 19:58:32 -06:00
Eric Eastwood 166857e0de Fix lints 2022-02-15 21:33:31 -06:00
Eric Eastwood 6c1cf6d46a Stop restarting server when the client bundle updates 2022-02-15 17:30:30 -06:00
Eric Eastwood e0279e080e Add npm run start-dev to build vite and run server at once 2022-02-15 17:17:14 -06:00
Eric Eastwood c0a2a65c2f Add support for the room header (RoomView) 2022-02-10 01:42:02 -06:00
Eric Eastwood 2378ed72c7 Add in date routes 2022-02-09 20:29:18 -06:00