* fix(replication): Correctly replicate commands even when OOM
Before this change, OOM in shard callbacks could have led to data
inconsistency between the master and the replica. For example, commands
which mutated data on 1 shard but failed on another, like `LMOVE`.
After this change, callbacks that result in an OOM will correctly
replicate their work (none, partial or complete) to replicas.
Note that `MSET` and `MSETNX` required special handling, in that they are
the only commands that can _create_ multiple keys, and so some of them
can fail.
Fixes#2381
* fixes
* test fix
* RecordJournal
* UNDO idiotnessness
* 2 shards
* fix pytest
* feat(server): Implement `CLIENT KILL`
Currently, it supports the following syntax:
* `CLIENT KILL <addr>:<port>`
* `CLIENT KILL ID <id>`
* `CLIENT KILL ADDR <addr>:<port>`
* `CLIENT KILL LADDR <addr>:<port>`
It will not allow killing an admin-connection from a non-admin port.
There are a few parameters of `CLIENT KILL` that Redis supports but this
PR does not yet add. Let's add them as needed.
Fixes#1614
* Add tests
* fixes
fixes#2296
added a regression test that tests both policy based eviction as well as heart beat eviction.
---------
Signed-off-by: Yue Li <61070669+theyueli@users.noreply.github.com>
* feat: add SLOT-MIGRATION-STATUS cmd for source node
implements #2232
add ability using SLOT-MIGRATION-STATUS without args
to print info about all migration processes for the current node
fix#2337
The bug:
replicaof was not rejected while loading snapshot
The fix:
replicaof is allowed while server is in loading state to allow replicaof while replication in full sync mode
I now reject replicaof if the server is in loading state and it is master
Another bug fix:
allow cron snapshot if --replicaof flag was set
Signed-off-by: adi_holden <adi@dragonflydb.io>
* refactor(server): Privatize `PreUpdate()` and `PostUpdate()`
While at it:
* Make `PreUpdate()` not decrease object size
* Remove redundant leftover call to `PreUpdate()` outside `DbSlice`
* Add pytest
* Test delete leads to 0 counters
* Improve test
* fixes
* comments
1. How many transactions we processed by type
2. How many transactions we processed by width (number of unique shards).
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(cluster): add command flow for slot migration process
fixes#2295
DFLYMIGRATE FLOW command was added to establish
connections for every shard replication process.
Slow serialization step is the separate issue so
for now only eof_token is sent for reply to
DFLYMIGRATE FLOW command.
Expected state for START-SLOT-MIGRATION is FULL_SYNC now.
* feat: DispatchTracker
Use a DispatchTracker to track ongoing dispatches for commands that change global state
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
fix: eliminate the redundant string copy in SendMGetResponse
Also, allow selectively create DflyInstance in pytests that is attached to
an existing dragonfly port, created outside of tests.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The DF version is being unparseable by Memcached::getVersion() that expects n.n.n string.
Change the version to emulate the old memcached server.
The DF version can still be fetched via Memcached::getStats() function.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: add more states to client connections
* fix: clear pipelined messages before close
* fix: skip same thread on backpressure
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Roman Gershman <roman@dragonflydb.io>
* fix(server): client pause fix on pipeline squash
allow squashing commands on pause
move await on client pause inside InvokeCommand - this way all flows of command invoke will read pause state
Signed-off-by: adi_holden <adi@dragonflydb.io>
This PR introduces a test case for TLS with `ca_dir`. First, we
did not have any tests for this case. Second, using `ca_dir` requires
to call `c_rehash` on the directory before it is loaded by DF. We
did not have this use case anywhere and therefore we thought there was
a bug when we used `ca_dir` only to find out that we need to call
`c_rehash` on the directory before we load the certificates. Now,
both a test and a use case are properly documented
* add missing test for ca_dir
* use rehash to properly show how to load ca directories instead of
files
Regression test sometimes fails because for a short period of time after `wait_available_async()` returns, the result of `ROLE` could still be different from `stable_sync`
[Failure example](https://github.com/dragonflydb/dragonfly/actions/runs/6726461923/job/18282759612#step:6:1863)
We change our state from `LOADING` to `ACTIVE` [here](d08d7f13b4/src/server/replica.cc (L426)), but then we change the sync state 2 times:
1. `!R_SYNCING` [here](d08d7f13b4/src/server/replica.cc (L427C28-L427C37))
2. And only later to `R_SYNC_OK` (meaning `stable_sync`) [here](d08d7f13b4/src/server/replica.cc (L221))
This is easy to reproduce by adding a sleep right after the set of state to `ACTIVE`, either before or after the flipping of `R_SYNCING` (with different returned states).
BTW without that added sleep I was not able to reproduce, having tried 1000s of times in various configurations.
We could change the order of things such that we first change `state_mask_` and only then switch state from `LOADING` to `ACTIVE` (which is probably the right thing to do), but that would require a subtle refactor, as we change these in a couple of places.
But we should keep in mind that this has no effect on users. So a simple sleep on the test side should fix this fairly well.
* chore: help users to fix a common mistake of setting quotes in the flagfile
Specifically, the confusion is often around the cron expression.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>