Commit Graph

130 Commits

Author SHA1 Message Date
Eric Eastwood ed8a89c358 Some clean-up 2023-06-30 03:01:16 -05:00
Eric Eastwood cad357fd2c Remove debug logs 2023-06-30 02:52:33 -05:00
Eric Eastwood 6f28dc629b Working seamless pagination 2023-06-30 02:50:14 -05:00
Eric Eastwood b3b15bdfe6 Better test, still not working 2023-06-30 02:05:39 -05:00
Eric Eastwood ef5b0e077e Paginate until we fill up the results 2023-06-29 02:03:49 -05:00
Eric Eastwood e3f82f248b Merge branch 'main' into madlittlemods/only-show-world_readable-rooms-in-room-directory 2023-06-29 01:23:29 -05:00
Eric Eastwood 0fc4421432
Indicate when the room was set to `world_readable` and by who (#278) 2023-06-28 20:29:49 -05:00
Eric Eastwood 58d80281bb Only show `world_readable` rooms in the room directory
Part of https://github.com/matrix-org/matrix-public-archive/issues/271
2023-06-28 17:50:55 -05:00
Tulir Asokan 1d3e930fbd
Don't allow previewing `shared` history rooms (#239)
Only `world_readable` can be considered as opting into having history publicly on the web. Anything else must not be archived until there's a dedicated state event for opting into archiving.
2023-06-27 16:56:58 -05:00
Eric Eastwood dd27c1054a
Prefer canonical alias in `rel=canonical` link (#269)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/266

Part of https://github.com/matrix-org/matrix-public-archive/issues/251
2023-06-22 02:23:48 -05:00
Eric Eastwood aff0423f4c
Prevent join event spam with stable `reason` (#268)
Fix https://github.com/matrix-org/matrix-public-archive/issues/267

In the case of someone visiting a room via an alias, we can't get access to the `room_id` before we join the room. I've opted to just point to the Matrix Public Archive instance in general. This way the `join` reason is always stable regardless how someone is visiting the room.

Join `reason` was originally added in https://github.com/matrix-org/matrix-public-archive/pull/262
2023-06-22 01:55:21 -05:00
Eric Eastwood 0f522bed20
Use `rel=canonical` link to de-duplicate event permalinks (#266)
Fix https://github.com/matrix-org/matrix-public-archive/issues/251
2023-06-22 01:50:55 -05:00
Eric Eastwood cf51d04433
Add /faq` redirect (#265)
Part of https://github.com/matrix-org/matrix-public-archive/issues/257
so we can set the display name of the bot to `archive.matrix.org/faq` and
people can read about the project is about and why the bot joined.
2023-06-21 20:29:26 -05:00
Eric Eastwood 1dd63212c0
Add reason why the archive bot is joining the room (#262)
Using the join `reason` added in [MSC2367](https://github.com/matrix-org/matrix-spec-proposals/pull/2367). Unfortunately, this PR doesn't have much effect because it doesn't look like many clients support it yet (Element doesn't support it for example).

Part of https://github.com/matrix-org/matrix-public-archive/issues/257
2023-06-09 16:05:20 -05:00
Eric Eastwood 4797f1e46a
Document why changes to locally linked hydrogen-view-sdk don't trigger a rebuild (#240) 2023-05-30 10:34:35 -05:00
Eric Eastwood f05d36e9f4
Fix mistake in config access for workaroundCloudflare504TimeoutErrors (#229)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/228
2023-05-11 16:34:16 -05:00
Eric Eastwood 55f1867c68
Prevent Cloudflare from overriding our own 504 timeout page (#228)
Explored in https://gitlab.matrix.org/matrix-public-archive/deployment/-/issues/2 (internal deployment issue)

> Cloudflare returns an Cloudflare-branded HTTP 502 or 504 error when your origin web server responds with a standard HTTP 502 bad gateway or 504 gateway timeout error:
>
> *-- https://developers.cloudflare.com/support/troubleshooting/cloudflare-errors/troubleshooting-cloudflare-5xx-errors/#502504-from-your-origin-web-server*

<img src="https://github.com/matrix-org/matrix-public-archive/assets/558581/46f6d88c-ba53-4efb-809f-3f331bf9b799" width="400">


The only way to disable this functionality is to have an Enterprise Cloudflare plan and use the `Enable Origin Error Pages` option:

> **Enable Origin Error Pages**
>
> When Origin Error Page is set to “On”, Cloudflare will proxy the 502 and 504 error pages directly from the origin.
>
> Requires Enterprise or higher

So instead of dealing with that headache, we're just working around this by responding with a 500 error when we timeout. Should be good enough I think. The user won't know any difference but may affect what Search Engines think. Not sure search engines care about the distinction since the page is slow to respond anyway which they punish.
2023-05-11 16:24:58 -05:00
Eric Eastwood 1a140b39c6
Better grammar in URL preview description (#225)
Part of https://github.com/matrix-org/matrix-public-archive/issues/202
2023-05-10 01:12:49 -05:00
Eric Eastwood 16323df054
Add image metadata for URL previews (#224)
- Default to a nice `[matrix]` banner
    -  There is room for improvement here when the Matrix Public Archive gets it's own logo (https://github.com/matrix-org/matrix-public-archive/issues/94) and maybe says "Matrix Public Archive" somewhere in the banner.
    - This is good enough for now (and certainly better than downstream previews using the first image on the page).
 - For rooms, it will use the room avatar

Part of https://github.com/matrix-org/matrix-public-archive/issues/202

Image is sized to 1200x630 to match conventions of `og:image`.

Crafted the banner image by modifying the header on the room directory homepage and taking a node screenshot. Page zoom @ 175%
2023-05-10 00:50:12 -05:00
Eric Eastwood 198e8c09be
Mark NSFW room pages with `<meta name="rating" content="adult">` (#216)
Related docs:

 - https://developers.google.com/search/docs/crawling-indexing/safesearch
 - https://developers.google.com/search/docs/crawling-indexing/special-tags
2023-05-05 15:36:26 -05:00
Eric Eastwood aeceb195e2
Add some `<meta name="description" ...>` to pages (#214)
Not the best but probably better than the default (a good first iteration)

Part of https://github.com/matrix-org/matrix-public-archive/issues/202
2023-05-04 22:46:09 -05:00
Eric Eastwood b10884505a
Fix time selector showing when less than the page limit of messages (#213)
Fix https://github.com/matrix-org/matrix-public-archive/issues/211
2023-05-04 20:50:43 -05:00
Eric Eastwood 9b067f8637
Set `X-Date-Temporal-Context` header for easy cache rules (#209)
Set `X-Date-Temporal-Context: [past|present|future]` header for easy cache rules:

 - Cache `past` things heavily
 - Cache `present`/`future` things for 5 minutes
 
This accomplishes the goal we set out for:

> - We can cache all responses except for the latest UTC day (and anything in the future). ex. `/!aMzLHLvScQCGKDNqCB:gitter.im/date/2022/10/13`
>    - For the latest day, we could set the cache expire after 5 minutes or so
>
> *-- [Matrix Public Archive deployment issue](https://github.com/vector-im/sre-internal/issues/2079)*

And this way we don't have to do any fancy date parsing and comparison from the URL which is probably not even possible Cloudflare cache rules.
2023-05-04 13:42:59 -05:00
Eric Eastwood 858c9dde8b
We can better detect static assets to avoid tracing nowadays (#207)
Because all assets are served from `/assets` since https://github.com/matrix-org/matrix-public-archive/pull/175
2023-05-02 00:55:22 -05:00
Eric Eastwood 9078abf4f1
Timeout requests and stop processing further (#204)
Fix https://github.com/matrix-org/matrix-public-archive/issues/148
Fix https://github.com/matrix-org/matrix-public-archive/issues/40

 - Apply timeout middleware to all room directory and room routes
 - Stop messing with the response after we timeout. Fix https://github.com/matrix-org/matrix-public-archive/issues/148
    - This also involves cancelling any `async/await` things like requests in the routes so we throw an abort error instead of continuing on. Fix https://github.com/matrix-org/matrix-public-archive/issues/40
 - Also abort the route if we see that the user closed the request before we could respond to them
 - Bumps minimum supported Node.js version to v18 because we're now using the built-in native `fetch` in Node.js vs `node-fetch`. This gives us the custom `signal.reason` that we aborted with instead of a generic `AbortError`.
    - This also means we had to add some instrumentation for `fetch` which uses `undici` under the hood. Settled on some unofficial instrumentation: [`opentelemetry-instrumentation-fetch-node`](https://www.npmjs.com/package/opentelemetry-instrumentation-fetch-node)
2023-05-02 00:39:01 -05:00
Eric Eastwood f3318446f8
Expose child errors that only occur in stderr log output (#205)
Who knows why we can't capture these errors via the more conventional `child.on('error', (err) => { })` listener 🤷 


### Before

```
RethrownError: Failed to render Hydrogen to string. In order to reproduce, feed in these arguments into `renderHydrogenToString(...)`:
    renderHydrogenToString arguments: { ... }
    at renderHydrogenToString (server/hydrogen-render/render-hydrogen-to-string.js:58:11)
    --- Original Error ---
    RethrownError: Child process exited with code 1
        at assembleErrorAfterChildExitsWithErrors (server/child-process-runner/run-in-child-process.js:60:29)
        --- Original Error ---
        No child errors
```

### After

```
RethrownError: Failed to render Hydrogen to string. In order to reproduce, feed in these arguments into `renderHydrogenToString(...)`:
    renderHydrogenToString arguments: { ... }
    at renderHydrogenToString (server/hydrogen-render/render-hydrogen-to-string.js:58:11)
    --- Original Error ---
    RethrownError: Child process exited with code 1
        at assembleErrorAfterChildExitsWithErrors (server/child-process-runner/run-in-child-process.js:60:29)
        --- Original Error ---
        No child errors but there might be something in stderr=node:internal/modules/cjs/loader:936
          throw err;
          ^

        Error: Cannot find module '../lib/rethrown-error'
        Require stack:
        - server/child-process-runner/child-fork-script.js
            at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
            at Function.Module._load (node:internal/modules/cjs/loader:778:27)
            at Module.require (node:internal/modules/cjs/loader:1005:19)
            at require (node:internal/modules/cjs/helpers:102:18)
            at Object.<anonymous> (server/child-process-runner/child-fork-script.js:8:23)
            at Module._compile (node:internal/modules/cjs/loader:1103:14)
            at Object.Module._extensions..js (node:internal/modules/cjs/loader:1155:10)
            at Module.load (node:internal/modules/cjs/loader:981:32)
            at Function.Module._load (node:internal/modules/cjs/loader:822:12)
            at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12) {
          code: 'MODULE_NOT_FOUND',
          requireStack: [
            'server//child-process-runner//child-fork-script.js'
          ]
        }
```
2023-05-01 17:33:48 -05:00
Eric Eastwood 0df1a79754
Fix styles on timeout page (#203)
Fix styles on timeout page since we started using the `manifest.json` for asset paths in https://github.com/matrix-org/matrix-public-archive/pull/175.
2023-05-01 15:13:16 -05:00
Eric Eastwood 53a1d4b43b
Update docs in preparation for Matrix Public Archive being generally available (#194) 2023-04-27 00:22:41 -05:00
Eric Eastwood f71fc2bb9c
Cache derived info from the `manifest.json` (#191)
- Like getting all of the dependencies for a given entry point
 - And the favicons
 
Also fix the problem where `server/hydrogen-render/render-page-html.js` was calling `getFaviconAssetUrls()` right away before the client build had a chance to generate `dist/manifest.json` and result in `Error: Cannot find module '../../dist/manifest.json'`
2023-04-26 17:04:49 -05:00
Eric Eastwood c297270f39
Link prior art and reasoning why we still always join before (#190)
See https://github.com/matrix-org/matrix-public-archive/issues/50
2023-04-26 16:39:53 -05:00
Eric Eastwood e20a67d2ba
Preload fonts and images (#187)
Part of https://github.com/matrix-org/matrix-public-archive/issues/132
2023-04-26 16:35:00 -05:00
Eric Eastwood 27863a1945
Iterate on `crossorigin` language in `Link` preload header comments (#186)
Hopefully more accurate now 🤞
2023-04-26 04:05:11 -05:00
Eric Eastwood a3952f1d31
Fix preload link headers (#185)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/171 and https://github.com/matrix-org/matrix-public-archive/pull/175 where they broke because we went from scripts to modules.

Part of https://github.com/matrix-org/matrix-public-archive/issues/132

Before this PR, we were seeing these warning in the Chrome devtools console:

```
A preload for 'foo' is found, but is not used because the request credentials mode does not match. Consider taking a look at crossorigin attribute.
```

This is caused by a credentials mode mismatch between the `<script type="module">` tag and the `Link` header. A `<script type="module">` with no `crossorigin` attribute indicates a credentials mode of `omit` and a naive `Link: </foo-url>; rel=preload; as=script;` has a  default credentials mode of `same-origin`, hence the mismatch and warning we're seeing.

We could set the credentials mode to match using `Link: </foo-url>; rel=preload; as=script; omit` but there is an even better option! We can use the dedicated `Link: </foo-url>; rel=modulepreload` link type which not only downloads and puts the the file in the cache like a normal preload but the browser also knows it's a JavaScript module now and can parse/compile it so it's ready to go.

---

Future consideration: Adding `nopush` to preload link headers. Many servers initiate an HTTP/2 Server Push when they encounter a preload link in HTTP header form otherwise. Do we want/care about that (or maybe we don't)? (mentioned in https://medium.com/reloading/preload-prefetch-and-priorities-in-chrome-776165961bbf#6f54)

---

References for preload `Link` headers:

  - https://medium.com/reloading/preload-prefetch-and-priorities-in-chrome-776165961bbf#6f54
  - https://html.spec.whatwg.org/multipage/links.html#link-type-preload
  - https://www.smashingmagazine.com/2016/02/preload-what-is-it-good-for/#headers
 - https://developer.chrome.com/blog/modulepreload/#ok-so-why-doesnt-link-relpreload-work-for-modules
2023-04-26 03:29:57 -05:00
Eric Eastwood d3e35a5de1
Make sure to restart the server after Vite `manifest.json` changes (#184)
Make sure to restart the server after Vite `manifest.json` changes so it can pick up the latest and serve pages correctly.
2023-04-26 02:09:46 -05:00
Eric Eastwood 2c12fec1e6
Fix scripts not loading from the production ready build PR (#183)
Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/175
2023-04-25 03:54:49 -05:00
Eric Eastwood 630e58fadc
Remove stray logs (#181)
Accidentally introduced in https://github.com/matrix-org/matrix-public-archive/pull/175
2023-04-25 01:21:53 -05:00
Eric Eastwood ac1419cdca
Only `require.resolve(...)` the path once (#180)
Perhaps an early optimization or not even needed but doesn't seem wise to keep pulling this over and over (best case it's cached).
2023-04-25 00:50:43 -05:00
Eric Eastwood 0f26dc94d3
Migrate from `eslint-plugin-node` to `eslint-plugin-n` (#179) 2023-04-25 00:39:59 -05:00
Eric Eastwood 9c0b6fe85e
Production ready build (#175)
- Rename `public` -> `client` so it doesn't get copied automagically as-is (without hashes which we want for cache busting), https://vitejs.dev/guide/assets.html#the-public-directory
     - We still build the version files to `public/` so their copied as-is and Vite handles it for us (so we can use `emptyOutDir`) 
 - Use a multiple entrypoint `.js` Vite build so things can be more intelligently bundled and take less time
     - We aren't using library mode because it doesn't minify or bundle assets
 - Using hash asset tags for cache busting. Hash of the file included in the file name
 - We lookup these hashed assets from `manifest.json` that Vite builds (https://vitejs.dev/guide/backend-integration.html) to serve and preload
 - In terms of optimized bundles, I know the current output isn't great now but will have to opt to fix that up separately in the future. Tracked by https://github.com/matrix-org/matrix-public-archive/issues/176
2023-04-24 23:50:53 -05:00
Eric Eastwood 50a1d658e8
Only read version tag files once on startup (#174)
We already read it once for the `/health-check` endpoint and cached the response but this way we can use `getVersionTags()` everywhere without worrying about it.

Also, it's no longer `async` so we can use it in things like Express route paths and CDN asset tags more easily.
2023-04-19 15:57:22 -05:00
Eric Eastwood 78ee88e094
Add route identifiers for easy metric reporting (#173)
Pre-requisite for https://github.com/matrix-org/matrix-public-archive/issues/162 and https://github.com/matrix-org/matrix-public-archive/issues/148
2023-04-19 15:09:51 -05:00
Eric Eastwood 27afaea8ca
Serve Hydrogen assets from `/hydrogen-assets/` sub-directory for easier targeting of cache rules (#172)
Fix https://github.com/matrix-org/matrix-public-archive/issues/160
2023-04-19 14:44:12 -05:00
Eric Eastwood 17a39ab8db
Add preload link headers for downstream Cloudflare early hints (#171)
Because it takes us at best several seconds to request information from a homeserver and then server-side render the page, the browser has to wait for the response before it can even try loading the necessary assets. With this change that facilitates early hints, the browser can preload all of the assets necessary before we are done generating the response and will be ready to go by the time we're all done on the server.

Fix https://github.com/matrix-org/matrix-public-archive/issues/32

Part of https://github.com/matrix-org/matrix-public-archive/issues/132

See https://developers.cloudflare.com/cache/about/early-hints/ for information on enabling in Cloudflare
2023-04-19 14:20:01 -05:00
Eric Eastwood 321c6a4f26
Slightly easier to understand renderHydrogenVmRenderScriptToPageHtml API surface (#170) 2023-04-19 13:48:12 -05:00
Eric Eastwood 551b4e72d1
Follow tombstone and predecessor history (#167)
Fix https://github.com/matrix-org/matrix-public-archive/issues/59

Other updates:

 - Update tests to use `/roomid/room1/date/2022/01/03` format instead of trying to retrofit the weird alias stuff on there. Which also makes the fancy to actual URL utilities much more simple.
 - Update to specify `archiveMessageLimit` in the test case because pages have different number of events depending on if we are against a boundary, hidden events, etc.
2023-04-19 01:26:15 -05:00
Eric Eastwood 6c789eae69
Do our best to get the user to the right place and try joining `via` derived server name (#168)
Split out from https://github.com/matrix-org/matrix-public-archive/pull/167
2023-04-11 15:09:44 -05:00
Eric Eastwood e99a0d6912
Rename to build-scripts to it appears in GitHub file finder (#166)
It seems like the `build/` directory is ignored in the GitHub file
finder as a sane default for people who put compiled assets there.

`build-scripts/` probably makes more sense anyway
2023-04-07 13:17:46 -05:00
Eric Eastwood 57d2cb3dd3
Refactor tests to use single source of truth ASCII diagram (#164)
- Less test bulk
 - Single source of truth: there is no mismatch between the comment and the expectations (we already caught a few mistakes in the conversion thanks to this benefit)
 - Easier to maintain and update
2023-04-07 12:52:41 -05:00
Eric Eastwood 954b22995a
Add a way to select time of day (#139)
- Fix https://github.com/matrix-org/matrix-public-archive/issues/7
 - A URL with time looks like
    - `/r/too-many-messages-on-day:my.synapse.server/date/2022/11/16T23:59`
    - Or when more precision is required (seconds): `/r/too-many-messages-on-day:my.synapse.server/date/2022/11/16T23:59:59`
 - Add new custom time picker/scrubber (pictured below) with momentum scrubbing
    - Native built-in `<input type="time">` for easier picking if you prefer that and accessibility.
    - Uses localized time strings
    - Design inspired by Thiago Sanchez's *Time Zone Translate* concept, https://dribbble.com/shots/14590546-Time-Zone-Translate
2023-04-05 04:25:31 -05:00
Philip Durbin 8f9e1631ae
Switch /timestamp_to_event from unstable to stable v1 #142 (#154) 2023-02-16 20:52:28 -06:00