1
0
Fork 0
mirror of https://github.com/element-hq/synapse.git synced 2025-03-07 08:26:50 +00:00
synapse/tests
Eric Eastwood 6ec5e13ec9
Fix join being denied after being invited over federation (#18075)
This also happens for rejecting an invite. Basically, any out-of-band membership transition where we first get the membership as an `outlier` and then rely on federation filling us in to de-outlier it.

This PR mainly addresses automated test flakiness, bots/scripts, and options within Synapse like [`auto_accept_invites`](https://element-hq.github.io/synapse/v1.122/usage/configuration/config_documentation.html#auto_accept_invites) that are able to react quickly (before federation is able to push us events), but also helps in generic scenarios where federation is lagging.

I initially thought this might be a Synapse consistency issue (see issues labeled with [`Z-Read-After-Write`](https://github.com/matrix-org/synapse/labels/Z-Read-After-Write)) but it seems to be an event auth logic problem. Workers probably do increase the number of possible race condition scenarios that make this visible though (replication and cache invalidation lag).

Fix https://github.com/element-hq/synapse/issues/15012
(probably fixes https://github.com/matrix-org/synapse/issues/15012 (https://github.com/element-hq/synapse/issues/15012))
Related to https://github.com/matrix-org/matrix-spec/issues/2062

Problems:

 1. We don't consider [out-of-band membership](https://github.com/element-hq/synapse/blob/develop/docs/development/room-dag-concepts.md#out-of-band-membership-events) (outliers) in our `event_auth` logic even though we expose them in `/sync`.
 1. (This PR doesn't address this point) Perhaps we should consider authing events in the persistence queue as events already in the queue could allow subsequent events to be allowed (events come through many channels: federation transaction, remote invite, remote join, local send). But this doesn't save us in the case where the event is more delayed over federation.


### What happened before?

I wrote some Complement test that stresses this exact scenario and reproduces the problem: https://github.com/matrix-org/complement/pull/757

```
COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestSynapseConsistency
```


We have `hs1` and `hs2` running in monolith mode (no workers):

 1. `@charlie1:hs2` is invited and joins the room:
     1. `hs1` invites `@charlie1:hs2` to a room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
     1. `@charlie1:hs2` decides to join because it saw the invite down `/sync`. Because `hs2` is not yet in the room, this happens as a remote join `make_join`/`send_join` which comes back with all of the auth events needed to auth successfully and now `@charlie1:hs2` is successfully joined to the room.
 1. `@charlie2:hs2` is invited and and tries to join the room:
     1. `hs1` invites `@charlie2:hs2` to the room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point.
     1. Because `hs2` is already participating in the room, we also see the invite come over federation in a transaction and we start processing it (not done yet, see below)
     1. `@charlie2:hs2` decides to join because it saw the invite down `/sync`. Because `hs2`, is already in the room, this happens as a local join but we deny the event because our `event_auth` logic thinks that we have no membership in the room  (expected to be able to join because we saw the invite down `/sync`)
     1. We finally finish processing the `@charlie2:hs2` invite event from and de-outlier it.
         - If this finished before we tried to join we would have been fine but this is the race condition that makes this situation visible.


Logs for `hs2`:

```
🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=False>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
📨 Notifying about new event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
 on_invite_request: handled event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True>
🧲 do_invite_join for @user-2-charlie1:hs2 in !sfZVBdLUezpPWetrol:hs1
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$bwv8LxFnqfpsw_rhR7OrTjtz09gaJ23MqstKOcs7ygA, type=m.room.member, state_key=@user-1-alice:hs1, membership=join, outlier=True>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>
📨 Notifying about new event <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False>

...

🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
 on_invite_request: handled event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True>
📬 handling received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
📮 handle_new_client_event: handling <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False>
 Denying new event <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False> because 403: You are not invited to this room.
synapse.http.server - 130 - INFO - POST-16 - <SynapseRequest at 0x7f460c91fbf0 method='POST' uri='/_matrix/client/v3/join/%21sfZVBdLUezpPWetrol:hs1?server_name=hs1' clientproto='HTTP/1.0' site='8080'> SynapseError: 403 - You are not invited to this room.
📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
 handled received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False>
```
2025-01-27 11:21:10 -06:00
..
api Consolidate SSO redirects through /_matrix/client/v3/login/sso/redirect(/{idpId}) (#17972) 2024-11-29 11:26:37 -06:00
app Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
appservice Format files with Ruff (#17643) 2024-09-02 12:39:04 +01:00
config Add macaroon_secret_key_path config option (#17983) 2024-12-16 18:01:33 -06:00
crypto Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
events Format files with Ruff (#17643) 2024-09-02 12:39:04 +01:00
federation Fix join being denied after being invited over federation (#18075) 2025-01-27 11:21:10 -06:00
handlers Fix join being denied after being invited over federation (#18075) 2025-01-27 11:21:10 -06:00
http Fix mypy errors on Twisted 24.11.0 (#17998) 2024-12-18 11:49:38 +00:00
logging Removal: Remove support for experimental msc3886 (#17638) 2024-11-13 14:10:20 +00:00
media Enable authenticated media by default (#17889) 2024-11-20 14:48:22 +00:00
metrics Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
module_api Format files with Ruff (#17643) 2024-09-02 12:39:04 +01:00
push MSC4076: Add disable_badge_count to pusher configuration (#17975) 2024-12-03 22:58:43 +00:00
replication Fix join being denied after being invited over federation (#18075) 2025-01-27 11:21:10 -06:00
rest Fix join being denied after being invited over federation (#18075) 2025-01-27 11:21:10 -06:00
scripts Update license headers 2023-11-21 15:29:58 -05:00
server_notices Sliding Sync: Add cache to get_tags_for_room(...) (#17730) 2024-09-19 12:43:26 +01:00
state Update license headers 2023-11-21 15:29:58 -05:00
storage Bust _membership_stream_cache cache when current state changes (#17732) 2025-01-08 10:11:09 -06:00
test_utils Add media tests for a CMYK JPEG image (#17786) 2024-10-23 18:26:01 +01:00
types Use immutabledict instead of frozendict (#15113) 2023-03-22 17:15:34 +00:00
util Implement MSC4133 to support custom profile fields. (#17488) 2025-01-21 11:11:04 +00:00
__init__.py Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
server.py Removal: Remove support for experimental msc3886 (#17638) 2024-11-13 14:10:20 +00:00
test_distributor.py Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
test_event_auth.py Format files with Ruff (#17643) 2024-09-02 12:39:04 +01:00
test_mau.py Update license headers 2023-11-21 15:29:58 -05:00
test_phone_home.py Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
test_rust.py Add missing type hints to tests. (#15027) 2023-02-08 19:52:37 +00:00
test_server.py Removal: Remove support for experimental msc3886 (#17638) 2024-11-13 14:10:20 +00:00
test_state.py Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
test_terms_auth.py Update license headers 2023-11-21 15:29:58 -05:00
test_test_utils.py Correctly mention previous copyright (#16820) 2024-01-23 11:26:48 +00:00
test_types.py Format files with Ruff (#17643) 2024-09-02 12:39:04 +01:00
test_visibility.py Include user membership on events (#17282) 2024-06-13 21:45:54 +00:00
unittest.py Support for MSC4190: device management for application services (#17705) 2024-12-04 12:04:49 +01:00
utils.py Fix join being denied after being invited over federation (#18075) 2025-01-27 11:21:10 -06:00