speed up state deduplication migrations on sqlite

Remove the self-inner-join and instead abuse sqlite's quirk that it allows us to mix aggregate & non-aggregate columns in a grouped select to return the event_ids of the events in a state group: see http://marc.info/?l=sqlite-users&m=141460824410980&w=2. This speeds up these queries by around 25x, drastically improving performance when sqlite is upgraded to 0.18.
2024-12-14 11:57:44 +00:00 · 2016-09-26 01:18:41 +01:00 · 2016-09-26 01:18:41 +01:00 · cc4944c83e
commit cc4944c83e
parent aab6a31c96
1 changed files with 9 additions and 7 deletions
--- a/synapse/storage/state.py
+++ b/synapse/storage/state.py
@ -389,14 +389,16 @@ class StateStore(SQLBaseStore):
                    if next_group:
                        group_tree.append(next_group)

+                # sqlite lets us mix aggregate & non-aggregate columns in
+                # a grouped select - for details, see:
+                # http://marc.info/?l=sqlite-users&m=141460824410980&w=2
+                # As a result, we can avoid a self-join to populate
+                # event_id, which empirically speeds things up by 25x.
                sql = ("""
-                    SELECT type, state_key, event_id FROM state_groups_state
-                    INNER JOIN (
-                        SELECT type, state_key, max(state_group) as state_group
-                        FROM state_groups_state
-                        WHERE state_group IN (%s) %s
-                        GROUP BY type, state_key
-                    ) USING (type, state_key, state_group);
+                    SELECT type, state_key, event_id, max(state_group) as state_group
+                    FROM state_groups_state
+                    WHERE state_group IN (%s) %s
+                    GROUP BY type, state_key;
                """) % (",".join("?" for _ in group_tree), where_clause,)

                args = list(group_tree)