Commit Graph

29 Commits

Author SHA1 Message Date
Eric Eastwood 0f522bed20
Use `rel=canonical` link to de-duplicate event permalinks (#266)
Fix https://github.com/matrix-org/matrix-public-archive/issues/251
2023-06-22 01:50:55 -05:00
Eric Eastwood 16323df054
Add image metadata for URL previews (#224)
- Default to a nice `[matrix]` banner
    -  There is room for improvement here when the Matrix Public Archive gets it's own logo (https://github.com/matrix-org/matrix-public-archive/issues/94) and maybe says "Matrix Public Archive" somewhere in the banner.
    - This is good enough for now (and certainly better than downstream previews using the first image on the page).
 - For rooms, it will use the room avatar

Part of https://github.com/matrix-org/matrix-public-archive/issues/202

Image is sized to 1200x630 to match conventions of `og:image`.

Crafted the banner image by modifying the header on the room directory homepage and taking a node screenshot. Page zoom @ 175%
2023-05-10 00:50:12 -05:00
Eric Eastwood 198e8c09be
Mark NSFW room pages with `<meta name="rating" content="adult">` (#216)
Related docs:

 - https://developers.google.com/search/docs/crawling-indexing/safesearch
 - https://developers.google.com/search/docs/crawling-indexing/special-tags
2023-05-05 15:36:26 -05:00
Eric Eastwood aeceb195e2
Add some `<meta name="description" ...>` to pages (#214)
Not the best but probably better than the default (a good first iteration)

Part of https://github.com/matrix-org/matrix-public-archive/issues/202
2023-05-04 22:46:09 -05:00
Eric Eastwood 9078abf4f1
Timeout requests and stop processing further (#204)
Fix https://github.com/matrix-org/matrix-public-archive/issues/148
Fix https://github.com/matrix-org/matrix-public-archive/issues/40

 - Apply timeout middleware to all room directory and room routes
 - Stop messing with the response after we timeout. Fix https://github.com/matrix-org/matrix-public-archive/issues/148
    - This also involves cancelling any `async/await` things like requests in the routes so we throw an abort error instead of continuing on. Fix https://github.com/matrix-org/matrix-public-archive/issues/40
 - Also abort the route if we see that the user closed the request before we could respond to them
 - Bumps minimum supported Node.js version to v18 because we're now using the built-in native `fetch` in Node.js vs `node-fetch`. This gives us the custom `signal.reason` that we aborted with instead of a generic `AbortError`.
    - This also means we had to add some instrumentation for `fetch` which uses `undici` under the hood. Settled on some unofficial instrumentation: [`opentelemetry-instrumentation-fetch-node`](https://www.npmjs.com/package/opentelemetry-instrumentation-fetch-node)
2023-05-02 00:39:01 -05:00
Eric Eastwood 0df1a79754
Fix styles on timeout page (#203)
Fix styles on timeout page since we started using the `manifest.json` for asset paths in https://github.com/matrix-org/matrix-public-archive/pull/175.
2023-05-01 15:13:16 -05:00
Eric Eastwood f71fc2bb9c
Cache derived info from the `manifest.json` (#191)
- Like getting all of the dependencies for a given entry point
 - And the favicons
 
Also fix the problem where `server/hydrogen-render/render-page-html.js` was calling `getFaviconAssetUrls()` right away before the client build had a chance to generate `dist/manifest.json` and result in `Error: Cannot find module '../../dist/manifest.json'`
2023-04-26 17:04:49 -05:00
Eric Eastwood 2c12fec1e6
Fix scripts not loading from the production ready build PR (#183)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/175
2023-04-25 03:54:49 -05:00
Eric Eastwood 630e58fadc
Remove stray logs (#181)
Accidentally introduced in https://github.com/matrix-org/matrix-public-archive/pull/175
2023-04-25 01:21:53 -05:00
Eric Eastwood ac1419cdca
Only `require.resolve(...)` the path once (#180)
Perhaps an early optimization or not even needed but doesn't seem wise to keep pulling this over and over (best case it's cached).
2023-04-25 00:50:43 -05:00
Eric Eastwood 9c0b6fe85e
Production ready build (#175)
- Rename `public` -> `client` so it doesn't get copied automagically as-is (without hashes which we want for cache busting), https://vitejs.dev/guide/assets.html#the-public-directory
     - We still build the version files to `public/` so their copied as-is and Vite handles it for us (so we can use `emptyOutDir`) 
 - Use a multiple entrypoint `.js` Vite build so things can be more intelligently bundled and take less time
     - We aren't using library mode because it doesn't minify or bundle assets
 - Using hash asset tags for cache busting. Hash of the file included in the file name
 - We lookup these hashed assets from `manifest.json` that Vite builds (https://vitejs.dev/guide/backend-integration.html) to serve and preload
 - In terms of optimized bundles, I know the current output isn't great now but will have to opt to fix that up separately in the future. Tracked by https://github.com/matrix-org/matrix-public-archive/issues/176
2023-04-24 23:50:53 -05:00
Eric Eastwood 321c6a4f26
Slightly easier to understand renderHydrogenVmRenderScriptToPageHtml API surface (#170) 2023-04-19 13:48:12 -05:00
Michael[tm] Smith 6b493ff807
Only assign `vmContext.global.crypto` if not already global (#143)
Fixes https://github.com/matrix-org/matrix-public-archive/issues/141

Node.js v19 has `crypto` set on the global already, so this change causes `vmContext.global.crypto` to be assigned only if `vmContext.global.crypto` isn’t already defined.

Otherwise, without this change, the room directory fails to render in Node.js v19+, and instead _"TypeError: Cannot set property crypto of `#<Object>` which has only a getter"_ gets thrown.
2022-11-18 12:27:50 -06:00
Eric Eastwood 11cbf39460
Add Matrix favicon (#135)
It's a cleaned up version of what [Matrix.org](https://matrix.org/) is using since that one is [so blurry](https://user-images.githubusercontent.com/558581/201302097-411b8033-4281-4cd3-a069-0c97ba3aa01f.png).

Part of https://github.com/matrix-org/matrix-public-archive/issues/94
2022-11-11 14:50:41 -06:00
Eric Eastwood fa4720af04
Increase perceived performance by scrolling to the right spot before Hydrogen loads (#128) 2022-11-09 18:57:33 -06:00
Eric Eastwood a0089b0fe4
Add `Content-Security-Policy` (CSP) (#81)
Add `Content-Security-Policy` (CSP) that restricts the page to just what it is expected to do.

This helps limit the damage that can be done by any XSS attack.

Fix https://github.com/matrix-org/internal-config/issues/1341
2022-10-19 12:07:39 -05:00
Eric Eastwood f796afe55e
Sanity check that we are not leaking the access token to the client (#82)
This isn't spawning from any previous security issue. Just adding an extra check to help ensure we don't ever regress this in the future.

```
AssertionError [ERR_ASSERTION]: We should not be leaking the `config.matrixAccessToken` to the Hydrogen render function because this will reach the client!
    at renderHydrogenToString (matrix-public-archive\server\hydrogen-render\render-hydrogen-to-string.js:24:3)
    at renderHydrogenVmRenderScriptToPageHtml (matrix-public-archive\server\hydrogen-render\render-hydrogen-vm-render-script-to-page-html.js:22:36)
    at matrix-public-archive\server\routes\room-directory-routes.js:53:28
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
```
2022-10-18 02:40:40 -05:00
Eric Eastwood 2581f88495
Fix XSS when blatting `window.matrixPublicArchiveContext` to the page (#79)
Fix https://github.com/matrix-org/internal-config/issues/1335
2022-10-13 14:36:04 -05:00
Eric Eastwood be837515fe
Show surrounding messages for a full screen of content (#71)
1. Add surrounding messages to the given messages so we have a full screen of content to make it feel lively even in quiet rooms
    - As you scroll around the timeline across different days, the date changes in the URL, calendar, etc
 2. Add summary item to the bottom of the timeline that explains if we couldn't find any messages in the specific day requested 
    - Also allows you to the jump to the next activity in the room. Adds `/:roomId/jump?ts=xxx&dir=[f|b]` to facilitate this.
    - Part of https://github.com/matrix-org/matrix-public-archive/issues/46
 3. Add developer options modal which is linked from the bottom of the right-panel
    - Adds an option so you can debug the `IntersectionObserver` and how it's selecting the active day from the top-edge of the scroll viewport.
    - In the future, this will also include a nice little visualization of the backend timing traces
2022-09-20 16:02:09 -05:00
Eric Eastwood 32c77ecffe
Only show `world_readable` or `public` rooms in the archive. Only index `world_readable` (#66)
Only show `world_readable` or `public` rooms in the archive. Only allow `world_readable` rooms to be indexed by search engines.

Related to https://github.com/matrix-org/matrix-public-archive/issues/47
2022-09-08 19:15:07 -05:00
Eric Eastwood 127d416e6a
Room directory landing page v1 (#61)
Part of https://github.com/matrix-org/matrix-public-archive/issues/6
2022-09-08 01:30:04 -05:00
Eric Eastwood 02b86a8405
Render pipeline separation of concerns (#64)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/36

Render pipeline separation of concerns:

 1. Run in `child_process`
 2. Hydrogen render
 
It's now just a generic `child_process` runner that runs the Hydrogen render in it. This eliminates the windy path of the 1-4 steps that was only held together by the file names themselves.
2022-09-02 20:49:06 -05:00
Eric Eastwood f6bd581f77
Better `child_process` error handling v2 - timeouts and actually fail process for error in scope (#62)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/51

Better `child_process` error handling for a couple scenarios with the finger pointing at it 👉

Also make sure we handle all of these scenarios:

 1. Child process fork script throws an `uncaughtException` or `unhandledRejection`
    - These are captured and serialized back to the parent and stored in `childErrors` and exposed if we never get a successful rendered HTML response.
 2. Child process fails to startup 
    - Render process is rejected in the `child.on('error', ...` callback
 3. 👉 Child process times out and is aborted
    - Render process is rejected in the `child.on('error', ...` callback and any `childErrors` encountered are logged
 4. 👉 Child process fork script throws an error in scope of in `process.on('message', async (renderOptions) => {`
    - Child exits with code 1 and we reject the render process with the error
 5. Child process exits with code 1 (error)
    - Render process is rejected with any `childError` info
 6. Child process exits with code 0 (success) but never sends back any HTML
    - We have a `returnedData` data check and any child errors encountered are logged
2022-09-02 18:49:45 -05:00
Eric Eastwood 36925cd603
Add test to make sure the archive doesn't fail when event for event relation is missing and not included in list of provided events (#43)
Add test to make sure the archive doesn't fail when event for event relation is missing and not included in list of provided events. Like if someone is replying to an event that was from long ago out of our range.

In the case of missing relations, Hydrogen does `_loadContextEntryNotInTimeline` because it can't find the event locally which throws an `uncaughtException`. Before https://github.com/matrix-org/matrix-public-archive/pull/51, the `uncaughtException` killed the Hydrogen `child_process` before it could pass back the HTML. Now this PR mainly just adds a test to make sure it works.
```
TypeError: Cannot read properties of undefined (reading 'storeNames')
    at TimelineReader.readById (hydrogen-web\target\lib-build\hydrogen.cjs.js:12483:33)
    at Timeline._getEventFromStorage (hydrogen-web\target\lib-build\hydrogen.cjs.js:12762:46)
    at Timeline._loadContextEntryNotInTimeline (hydrogen-web\target\lib-build\hydrogen.cjs.js:12747:35)
    at Timeline._loadContextEntriesWhereNeeded (hydrogen-web\target\lib-build\hydrogen.cjs.js:12741:14)
    at Timeline.addEntries (hydrogen-web\target\lib-build\hydrogen.cjs.js:12699:10)
    at mountHydrogen (4-hydrogen-vm-render-script.js:204:12)
    at 4-hydrogen-vm-render-script.js:353:1
    at Script.runInContext (node:vm:139:12)
    at _renderHydrogenToStringUnsafe (matrix-public-archive\server\hydrogen-render\3-render-hydrogen-to-string-unsafe.js:102:41)
    at async process.<anonymous> (matrix-public-archive\server\hydrogen-render\2-render-hydrogen-to-string-fork-script.js:18:27)
```
2022-08-29 19:42:18 -05:00
Eric Eastwood bdaa98e722
Make the `child_process` error catching more robust (`uncaughtException`) (#51)
Split off from https://github.com/matrix-org/matrix-public-archive/pull/43

Listen to `process.on('uncaughtException', ...)` and handle the async errors ourselves so it no longer fails the child process.

And if the process does exit with status code 1 (error), we have those underlying errors serialized and shown.
2022-08-29 19:13:56 -05:00
Eric Eastwood b5b79b94f2
Manually instrument some archive logic (#44) 2022-08-29 14:13:13 -05:00
Eric Eastwood 13eb92b067
Make sure we finish sending the HTML payload before we exit the process (#38)
I encountered a page which responded successfully but all of the Hydrogen HTML was missing. It just had the boilerplate around it.

What I am guessing happened is that since `process.send` is async, with a sufficiently large
payload and race condition, `process.exit(0)` was being called before it finished sending.

Related:
 - https://stackoverflow.com/questions/34627546/process-send-is-sync-async-on-nix-windows
 - 56d9584a0e
 - https://github.com/nodejs/node/issues/6767
2022-07-06 19:24:29 -05:00
Eric Eastwood 7eaa103a28
Fix large option payloads throwing E2BIG and ENAMETOOLONG (#37)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/36
2022-07-05 18:00:29 -05:00
Eric Eastwood f738dbc1da
Stop Hydrogen from running in the background after we get our SSR HTML render data (#36)
We now run the Hydrogen render in a `child_process` so we can exit the whole render process. We still use the `vm` to setup the browser-like globals. With a `vm`, everything continues to run even after it returns and there isn't a way to clean up, stop, kill, terminate the vm script or context so we need this extra `child_process` now to clean up. I don't like the complexity necessary for this though. I wish the `vm` API allowed for this use case. The only way to stop a `vm` is the `timeout` and we want to stop as soon as we return.

Fix https://github.com/matrix-org/matrix-public-archive/issues/34
2022-07-05 17:30:52 -05:00