Update FAQ to explain `world_readable` only

Follow-up to https://github.com/matrix-org/matrix-public-archive/pull/239
This commit is contained in:
Eric Eastwood 2023-06-28 18:04:14 -05:00
parent ff18a46283
commit 7f12d62584
1 changed files with 33 additions and 33 deletions

View File

@ -19,54 +19,54 @@ messages from any given date and day-by-day navigation.
## Why did the archive bot join my room? ## Why did the archive bot join my room?
Only public Matrix rooms with `shared` or `world_readable` [history Only Matrix rooms with `world_readable` [history
visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility) are visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility)
accessible in the Matrix Public Archive. In some clients like Element, the `shared` are accessible in the Matrix Public Archive and indexed by search engines.
option equates to "Members only (since the point in time of selecting this option)" and
`world_readable` to "Anyone" under the **room settings** -> **Security & Privacy** ->
**Who can read history?**.
But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't
know the history visibility without first joining. Any room without `world_readable` or know the history visibility without first joining. Any room that doesn't have
`shared` history visibility will lead a `403 Forbidden`. And if the public room is in `world_readable` history visibility will lead a `403 Forbidden`.
the room directory, it will be listed in the archive but will still lead to a `403
Forbidden` in that case.
The Matrix Public Archive doesn't hold onto any data (it's The Matrix Public Archive doesn't hold onto any data (it's
stateless) and requests the messages from the homeserver every time. The stateless) and requests the messages from the homeserver every time. The
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5 [archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
minutes for the current day, and 2 days for past content. minutes for the current day, and 2 days for past content.
The Matrix Public Archive only allows rooms with `world_readable` history visibility to See the [opt out
be indexed by search engines. See the [opt section](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) below
out](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) topic for more details.
below for more details.
### Why does the archive user join rooms instead of browsing them as a guest?
Guests require `m.room.guest_access` to access a room. Most public rooms do not allow
guests because even the `public_chat` preset when creating a room does not allow guest
access. Not being able to view most public rooms is the major blocker on being able to
use guest access. The idea is if I can view the messages from a Matrix client as a
random user, I should also be able to see the messages in the archive.
Guest access is also a much different ask than read-only access since guests can also
send messages in the room which isn't always desirable. The archive bot is read-only and
does not send messages.
## How do I opt out and keep my room from being indexed by search engines? ## How do I opt out and keep my room from being indexed by search engines?
Only public Matrix rooms with `shared` or `world_readable` history visibility are Only Matrix rooms with `world_readable` [history
accessible to view in the Matrix Public Archive. But only rooms with history visibility visibility](https://spec.matrix.org/latest/client-server-api/#room-history-visibility)
set to `world_readable` are indexable by search engines. are accessible in the Matrix Public Archive and indexed by search engines. One easy way
to opt-out is to change your rooms history visibility to something else if you don't
intend for your room be world readable.
Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better Dedicated opt-out controls are being tracked in
opt out controls. [#47](https://github.com/matrix-org/matrix-public-archive/issues/47).
As a workaround for [archive.matrix.org](https://archive.matrix.org/) today, you can ban As a workaround for [archive.matrix.org](https://archive.matrix.org/), you can ban the
the `@archive:matrix.org` user if you don't want your room content to be shown in the `@archive:matrix.org` user if you don't want your room content to be shown in the
archive at all. archive at all.
### Why does the archive user join rooms instead peeking in the room or using guests?
Since the archive only displays rooms with `world_readable` history visibility, we could
peek into the rooms without joining. This is being explored in
[#272](https://github.com/matrix-org/matrix-public-archive/pull/272). But peeking
doesn't work when the server doesn't know about the room already (this is commonly
referred to as federated peeking) which is why we have to fallback to joining the room
in any case. We could solve the federated peeking problem and avoid the join with
[MSC3266 room summaries](https://github.com/matrix-org/matrix-spec-proposals/pull/3266)
to check whether the room is `world_readable` even over federation.
Guests are completely separate concept and controlled by the `m.room.guest_access` state
event in the room. Guest access is also a much different ask than read-only access since
guests can also send messages in the room which isn't always desirable. The archive bot
is read-only and does not send messages.
## Technical details ## Technical details
The main readme has a [technical overview](../README.md#technical-overview) of the The main readme has a [technical overview](../README.md#technical-overview) of the