Add reason why the archive bot is joining the room (#262)
Using the join `reason` added in [MSC2367](https://github.com/matrix-org/matrix-spec-proposals/pull/2367). Unfortunately, this PR doesn't have much effect because it doesn't look like many clients support it yet (Element doesn't support it for example). Part of https://github.com/matrix-org/matrix-public-archive/issues/257
This commit is contained in:
parent
8da9b3d957
commit
1dd63212c0
52
docs/faq.md
52
docs/faq.md
|
@ -17,19 +17,32 @@ And with the introduction of the jump to date API via
|
||||||
[MSC3030](https://github.com/matrix-org/matrix-spec-proposals/pull/3030), we could show
|
[MSC3030](https://github.com/matrix-org/matrix-spec-proposals/pull/3030), we could show
|
||||||
messages from any given date and day-by-day navigation.
|
messages from any given date and day-by-day navigation.
|
||||||
|
|
||||||
## How do I opt out and keep my room from being indexed by search engines?
|
## Why did the archive bot join my room?
|
||||||
|
|
||||||
All public Matrix rooms are accessible to view in the Matrix Public Archive. But only
|
Only public Matrix rooms with `shared` or `world_readable` [history
|
||||||
rooms with history visibility set to `world_readable` are indexable by search engines.
|
visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) are
|
||||||
|
accessible in the Matrix Public Archive. In some clients like Element, the `shared`
|
||||||
|
option equates to "Members only (since the point in time of selecting this option)" and
|
||||||
|
`world_readable` to "Anyone" under the **room settings** -> **Security & Privacy** ->
|
||||||
|
**Who can read history?**.
|
||||||
|
|
||||||
Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
|
But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't
|
||||||
opt out controls.
|
know the history visibility without first joining. Any room without `world_readable` or
|
||||||
|
`shared` history visibility will lead a `403 Forbidden`. And if the public room is in
|
||||||
|
the room directory, it will be listed in the archive but will still lead to a `403
|
||||||
|
Forbidden` in that case.
|
||||||
|
|
||||||
For [archive.matrix.org](https://archive.matrix.org/), you can ban the
|
The Matrix Public Archive doesn't hold onto any data (it's
|
||||||
`@archive:matrix.org` user if you don't want your room content to be shown in the
|
stateless) and requests the messages from the homeserver every time. The
|
||||||
archive at all.
|
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
|
||||||
|
minutes for the current day, and 2 days for past content.
|
||||||
|
|
||||||
## Why does the archive user join rooms instead of browsing them as a guest?
|
The Matrix Public Archive only allows rooms with `world_readable` history visibility to
|
||||||
|
be indexed by search engines. See the [opt
|
||||||
|
out](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) topic
|
||||||
|
below for more details.
|
||||||
|
|
||||||
|
### Why does the archive user join rooms instead of browsing them as a guest?
|
||||||
|
|
||||||
Guests require `m.room.guest_access` to access a room. Most public rooms do not allow
|
Guests require `m.room.guest_access` to access a room. Most public rooms do not allow
|
||||||
guests because even the `public_chat` preset when creating a room does not allow guest
|
guests because even the `public_chat` preset when creating a room does not allow guest
|
||||||
|
@ -37,11 +50,22 @@ access. Not being able to view most public rooms is the major blocker on being a
|
||||||
use guest access. The idea is if I can view the messages from a Matrix client as a
|
use guest access. The idea is if I can view the messages from a Matrix client as a
|
||||||
random user, I should also be able to see the messages in the archive.
|
random user, I should also be able to see the messages in the archive.
|
||||||
|
|
||||||
Keep in mind that only rooms with history visibility set to `world_readable` are
|
Guest access is also a much different ask than read-only access since guests can also
|
||||||
indexable by search engines. The Matrix Public Archive doesn't hold onto any data (it's
|
send messages in the room which isn't always desirable. The archive bot is read-only and
|
||||||
stateless) and requests the messages from the homeserver every time. The
|
does not send messages.
|
||||||
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
|
|
||||||
minutes for the current day, and 2 days for past content.
|
## How do I opt out and keep my room from being indexed by search engines?
|
||||||
|
|
||||||
|
Only public Matrix rooms with `shared` or `world_readable` history visibility are
|
||||||
|
accessible to view in the Matrix Public Archive. But only rooms with history visibility
|
||||||
|
set to `world_readable` are indexable by search engines.
|
||||||
|
|
||||||
|
Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
|
||||||
|
opt out controls.
|
||||||
|
|
||||||
|
As a workaround for [archive.matrix.org](https://archive.matrix.org/) today, you can ban
|
||||||
|
the `@archive:matrix.org` user if you don't want your room content to be shown in the
|
||||||
|
archive at all.
|
||||||
|
|
||||||
## Technical details
|
## Technical details
|
||||||
|
|
||||||
|
|
|
@ -3,14 +3,19 @@
|
||||||
const assert = require('assert');
|
const assert = require('assert');
|
||||||
const urlJoin = require('url-join');
|
const urlJoin = require('url-join');
|
||||||
|
|
||||||
|
const StatusError = require('../errors/status-error');
|
||||||
const { fetchEndpointAsJson } = require('../fetch-endpoint');
|
const { fetchEndpointAsJson } = require('../fetch-endpoint');
|
||||||
const getServerNameFromMatrixRoomIdOrAlias = require('./get-server-name-from-matrix-room-id-or-alias');
|
const getServerNameFromMatrixRoomIdOrAlias = require('./get-server-name-from-matrix-room-id-or-alias');
|
||||||
|
const MatrixPublicArchiveURLCreator = require('matrix-public-archive-shared/lib/url-creator');
|
||||||
|
|
||||||
const config = require('../config');
|
const config = require('../config');
|
||||||
const StatusError = require('../errors/status-error');
|
const basePath = config.get('basePath');
|
||||||
|
assert(basePath);
|
||||||
const matrixServerUrl = config.get('matrixServerUrl');
|
const matrixServerUrl = config.get('matrixServerUrl');
|
||||||
assert(matrixServerUrl);
|
assert(matrixServerUrl);
|
||||||
|
|
||||||
|
const matrixPublicArchiveURLCreator = new MatrixPublicArchiveURLCreator(basePath);
|
||||||
|
|
||||||
async function ensureRoomJoined(
|
async function ensureRoomJoined(
|
||||||
accessToken,
|
accessToken,
|
||||||
roomIdOrAlias,
|
roomIdOrAlias,
|
||||||
|
@ -43,6 +48,19 @@ async function ensureRoomJoined(
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
accessToken,
|
accessToken,
|
||||||
abortSignal,
|
abortSignal,
|
||||||
|
body: {
|
||||||
|
reason:
|
||||||
|
`Joining room to check history visibility. ` +
|
||||||
|
`If your room is public with shared or world readable history visibility, ` +
|
||||||
|
`it will be accessible at ${matrixPublicArchiveURLCreator.archiveUrlForRoom(
|
||||||
|
roomIdOrAlias
|
||||||
|
// We don't need to include the `viaServers` option here because the archive
|
||||||
|
// will already be joined to the room from this request itself and we don't
|
||||||
|
// need to make the URL any longer/noisier than it needs to be.
|
||||||
|
)}. ` +
|
||||||
|
`See the FAQ for more details: ` +
|
||||||
|
`https://github.com/matrix-org/matrix-public-archive/blob/main/docs/faq.md#why-did-the-archive-bot-join-my-room`,
|
||||||
|
},
|
||||||
});
|
});
|
||||||
assert(
|
assert(
|
||||||
joinData.room_id,
|
joinData.room_id,
|
||||||
|
|
|
@ -14,6 +14,7 @@ const chalk = require('chalk');
|
||||||
const RethrownError = require('../server/lib/errors/rethrown-error');
|
const RethrownError = require('../server/lib/errors/rethrown-error');
|
||||||
const MatrixPublicArchiveURLCreator = require('matrix-public-archive-shared/lib/url-creator');
|
const MatrixPublicArchiveURLCreator = require('matrix-public-archive-shared/lib/url-creator');
|
||||||
const { fetchEndpointAsText, fetchEndpointAsJson } = require('../server/lib/fetch-endpoint');
|
const { fetchEndpointAsText, fetchEndpointAsJson } = require('../server/lib/fetch-endpoint');
|
||||||
|
const ensureRoomJoined = require('../server/lib/matrix-utils/ensure-room-joined');
|
||||||
const config = require('../server/lib/config');
|
const config = require('../server/lib/config');
|
||||||
const {
|
const {
|
||||||
MS_LOOKUP,
|
MS_LOOKUP,
|
||||||
|
@ -999,10 +1000,11 @@ describe('matrix-public-archive', () => {
|
||||||
// avoid problems jumping to the latest activity since we can't control the
|
// avoid problems jumping to the latest activity since we can't control the
|
||||||
// timestamp of the membership event.
|
// timestamp of the membership event.
|
||||||
const archiveAppServiceUserClient = await getTestClientForAs();
|
const archiveAppServiceUserClient = await getTestClientForAs();
|
||||||
await joinRoom({
|
// We use `ensureRoomJoined` instead of `joinRoom` because we're joining
|
||||||
client: archiveAppServiceUserClient,
|
// the archive user here and want the same join `reason` to avoid a new
|
||||||
roomId: roomId,
|
// state event being created (`joinRoom` -> `{ displayname, membership }`
|
||||||
});
|
// whereas `ensureRoomJoined` -> `{ reason, displayname, membership }`)
|
||||||
|
await ensureRoomJoined(archiveAppServiceUserClient.accessToken, roomId);
|
||||||
|
|
||||||
// Just spread things out a bit so the event times are more obvious
|
// Just spread things out a bit so the event times are more obvious
|
||||||
// and stand out from each other while debugging and so we just have
|
// and stand out from each other while debugging and so we just have
|
||||||
|
|
Loading…
Reference in New Issue