Add reason why the archive bot is joining the room (#262)
Using the join `reason` added in [MSC2367](https://github.com/matrix-org/matrix-spec-proposals/pull/2367). Unfortunately, this PR doesn't have much effect because it doesn't look like many clients support it yet (Element doesn't support it for example). Part of https://github.com/matrix-org/matrix-public-archive/issues/257
This commit is contained in:
parent
8da9b3d957
commit
1dd63212c0
52
docs/faq.md
52
docs/faq.md
|
@ -17,19 +17,32 @@ And with the introduction of the jump to date API via
|
|||
[MSC3030](https://github.com/matrix-org/matrix-spec-proposals/pull/3030), we could show
|
||||
messages from any given date and day-by-day navigation.
|
||||
|
||||
## How do I opt out and keep my room from being indexed by search engines?
|
||||
## Why did the archive bot join my room?
|
||||
|
||||
All public Matrix rooms are accessible to view in the Matrix Public Archive. But only
|
||||
rooms with history visibility set to `world_readable` are indexable by search engines.
|
||||
Only public Matrix rooms with `shared` or `world_readable` [history
|
||||
visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) are
|
||||
accessible in the Matrix Public Archive. In some clients like Element, the `shared`
|
||||
option equates to "Members only (since the point in time of selecting this option)" and
|
||||
`world_readable` to "Anyone" under the **room settings** -> **Security & Privacy** ->
|
||||
**Who can read history?**.
|
||||
|
||||
Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
|
||||
opt out controls.
|
||||
But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't
|
||||
know the history visibility without first joining. Any room without `world_readable` or
|
||||
`shared` history visibility will lead a `403 Forbidden`. And if the public room is in
|
||||
the room directory, it will be listed in the archive but will still lead to a `403
|
||||
Forbidden` in that case.
|
||||
|
||||
For [archive.matrix.org](https://archive.matrix.org/), you can ban the
|
||||
`@archive:matrix.org` user if you don't want your room content to be shown in the
|
||||
archive at all.
|
||||
The Matrix Public Archive doesn't hold onto any data (it's
|
||||
stateless) and requests the messages from the homeserver every time. The
|
||||
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
|
||||
minutes for the current day, and 2 days for past content.
|
||||
|
||||
## Why does the archive user join rooms instead of browsing them as a guest?
|
||||
The Matrix Public Archive only allows rooms with `world_readable` history visibility to
|
||||
be indexed by search engines. See the [opt
|
||||
out](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) topic
|
||||
below for more details.
|
||||
|
||||
### Why does the archive user join rooms instead of browsing them as a guest?
|
||||
|
||||
Guests require `m.room.guest_access` to access a room. Most public rooms do not allow
|
||||
guests because even the `public_chat` preset when creating a room does not allow guest
|
||||
|
@ -37,11 +50,22 @@ access. Not being able to view most public rooms is the major blocker on being a
|
|||
use guest access. The idea is if I can view the messages from a Matrix client as a
|
||||
random user, I should also be able to see the messages in the archive.
|
||||
|
||||
Keep in mind that only rooms with history visibility set to `world_readable` are
|
||||
indexable by search engines. The Matrix Public Archive doesn't hold onto any data (it's
|
||||
stateless) and requests the messages from the homeserver every time. The
|
||||
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
|
||||
minutes for the current day, and 2 days for past content.
|
||||
Guest access is also a much different ask than read-only access since guests can also
|
||||
send messages in the room which isn't always desirable. The archive bot is read-only and
|
||||
does not send messages.
|
||||
|
||||
## How do I opt out and keep my room from being indexed by search engines?
|
||||
|
||||
Only public Matrix rooms with `shared` or `world_readable` history visibility are
|
||||
accessible to view in the Matrix Public Archive. But only rooms with history visibility
|
||||
set to `world_readable` are indexable by search engines.
|
||||
|
||||
Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
|
||||
opt out controls.
|
||||
|
||||
As a workaround for [archive.matrix.org](https://archive.matrix.org/) today, you can ban
|
||||
the `@archive:matrix.org` user if you don't want your room content to be shown in the
|
||||
archive at all.
|
||||
|
||||
## Technical details
|
||||
|
||||
|
|
|
@ -3,14 +3,19 @@
|
|||
const assert = require('assert');
|
||||
const urlJoin = require('url-join');
|
||||
|
||||
const StatusError = require('../errors/status-error');
|
||||
const { fetchEndpointAsJson } = require('../fetch-endpoint');
|
||||
const getServerNameFromMatrixRoomIdOrAlias = require('./get-server-name-from-matrix-room-id-or-alias');
|
||||
const MatrixPublicArchiveURLCreator = require('matrix-public-archive-shared/lib/url-creator');
|
||||
|
||||
const config = require('../config');
|
||||
const StatusError = require('../errors/status-error');
|
||||
const basePath = config.get('basePath');
|
||||
assert(basePath);
|
||||
const matrixServerUrl = config.get('matrixServerUrl');
|
||||
assert(matrixServerUrl);
|
||||
|
||||
const matrixPublicArchiveURLCreator = new MatrixPublicArchiveURLCreator(basePath);
|
||||
|
||||
async function ensureRoomJoined(
|
||||
accessToken,
|
||||
roomIdOrAlias,
|
||||
|
@ -43,6 +48,19 @@ async function ensureRoomJoined(
|
|||
method: 'POST',
|
||||
accessToken,
|
||||
abortSignal,
|
||||
body: {
|
||||
reason:
|
||||
`Joining room to check history visibility. ` +
|
||||
`If your room is public with shared or world readable history visibility, ` +
|
||||
`it will be accessible at ${matrixPublicArchiveURLCreator.archiveUrlForRoom(
|
||||
roomIdOrAlias
|
||||
// We don't need to include the `viaServers` option here because the archive
|
||||
// will already be joined to the room from this request itself and we don't
|
||||
// need to make the URL any longer/noisier than it needs to be.
|
||||
)}. ` +
|
||||
`See the FAQ for more details: ` +
|
||||
`https://github.com/matrix-org/matrix-public-archive/blob/main/docs/faq.md#why-did-the-archive-bot-join-my-room`,
|
||||
},
|
||||
});
|
||||
assert(
|
||||
joinData.room_id,
|
||||
|
|
|
@ -14,6 +14,7 @@ const chalk = require('chalk');
|
|||
const RethrownError = require('../server/lib/errors/rethrown-error');
|
||||
const MatrixPublicArchiveURLCreator = require('matrix-public-archive-shared/lib/url-creator');
|
||||
const { fetchEndpointAsText, fetchEndpointAsJson } = require('../server/lib/fetch-endpoint');
|
||||
const ensureRoomJoined = require('../server/lib/matrix-utils/ensure-room-joined');
|
||||
const config = require('../server/lib/config');
|
||||
const {
|
||||
MS_LOOKUP,
|
||||
|
@ -999,10 +1000,11 @@ describe('matrix-public-archive', () => {
|
|||
// avoid problems jumping to the latest activity since we can't control the
|
||||
// timestamp of the membership event.
|
||||
const archiveAppServiceUserClient = await getTestClientForAs();
|
||||
await joinRoom({
|
||||
client: archiveAppServiceUserClient,
|
||||
roomId: roomId,
|
||||
});
|
||||
// We use `ensureRoomJoined` instead of `joinRoom` because we're joining
|
||||
// the archive user here and want the same join `reason` to avoid a new
|
||||
// state event being created (`joinRoom` -> `{ displayname, membership }`
|
||||
// whereas `ensureRoomJoined` -> `{ reason, displayname, membership }`)
|
||||
await ensureRoomJoined(archiveAppServiceUserClient.accessToken, roomId);
|
||||
|
||||
// Just spread things out a bit so the event times are more obvious
|
||||
// and stand out from each other while debugging and so we just have
|
||||
|
|
Loading…
Reference in New Issue