The regression was caused by #3947 and it causes crashes in bullmq.
It has not been found till now because python client sends commands in uppercase.
Fixes#4113
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io>
Fixes#3896. Now we retry several times.
In my checks this should significantly reduce the failure probability.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Test is flaky because it relies that the producer (the pytest) to send fast enough a bunch of commands before they get dispatched synchronously so I increased the load.
It appears that newer versions of the gh runner require more memory. Some cases of the test test_rss_used_mem_gap allocate more than 6.5-7 gb of memory leaving barely 0.5gb to the gh runner (7.5 in total available) which sometimes cause the instance to run out of memory.
* fix: Fix `test_flushall_in_full_sync`
This test failed in CI many times. The issue was that we reach stable
sync too quickly, and miss the full sync stage.
I changed the seeder to add 100k (instead of 30k) keys for the stage to
take longer.
* StaticSeeder
Until now, we only tested Dragonfly against Redis 6.2. It appears that
something has changed in the way Redis sends stable sync commands, and
now they also forward `MULTI` and `EXEC` as part of their replication.
Since we do not allow all commands to run under `MULTI`/`EXEC`,
specifically `SELECT`, a Dragonfly replica of such servers failed these
commands and became inconsistent with the data on the master.
The proposed fix is to simply ignore (i.e. not execute) `MULTI`/`EXEC`
coming from a Redis/Valkey master, and run the commands within those
transactions individually, like we do for other transactions.
To test this we randomly choose a redis/valkey server based on 3
available installed binaries and test against them.
* fix: Do not publish to connections without context
This is a rare case where a closed connection is kept alive while the
handling fiber yields, therefore leaving `cc_` (the connection context)
pointing to null for other fibers to see.
As far as I can see, this can only happen during server shutdown, but
there could be other cases that I have missed.
The test on its own does _not_ reproduce the crash, however with added
`ThisFiber::SleepFor()`s I could reproduce the crash:
* Right before `DispatchBrief()`
[here](e3214cb603/src/server/channel_store.cc (L154))
* Right after connection context `reset()`
[here](2ab480e160/src/facade/dragonfly_connection.cc (L750))
In any case, calling `SendPubMessageAsync()` to a connection where `cc_`
is null is a bug, and we fix that here.
* rewording
A common case is that we need to clean up a connection before we exit a test via .close() method. This is needed because otherwise the connection will raise a warning that it is left unclosed. However, remembering to call .close() at each connection at the end of the test is cumbersome! Luckily, fixtures in python can be marked as async which allow us to:
* cache all clients created by DflyInstance.client()
* clean them all at the end of the fixture in one go
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
We do not allow notify_keyspace_events to be set at runtime via config set command.
* allow notify_keyspace_events in config set command
* add tests
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* fix bitiop creating the dst key if result is empty
* fix replicating dst with the wrong type
* make bitop a blind update (similar to set command)
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* chore: Forbid replicating a replica
We do not support connecting a replica to a replica, but before this PR
we allowed doing so. This PR disables that behavior.
Fixes#3679
* `replicaof_mu_`
fix: Fix flaky test `test_acl_revoke_pub_sub_while_subscribed`
The reason it failed is that, in some rare cases, the subscriber did not
get the first few messages of the publisher. This is likely due to
timing of subscribe and publish, in different connections / threads.
Given Pub/Sub has very weak guarantees, it's probably ok as is, so I
just added a sleep to get the test to pass always.