* chore: improve replication locks
Allow non-exclusive, read-only access to Dfly::ReplicaInfo structure.
The most important change is in DflyCmd::CancelReplication, where before
it has locked ReplicaInfo mutex and then continued with locking the global mutex.
It is dangerous because most operation lock them in the opposite order.
Also rename ambigous GetReplicaInfo accessors to clearer names.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: comments
* chore: comments
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat: Support `replica-announce-ip`/`port`
Before this PR, we only supported `cluster_announce_ip`.
It's basically the same feature, but used for cluster announcements
instead of replication.
This PR adds support for `replica-announce-ip` and
`replica-announce-port`, which can be set via new flags `--announce_ip=`
and `--announce_port=`. These flags apply to both cluster and replica
announcements.
Tested via running Sentinel, and making sure it is able to connect to
announced ip+port, while it can't connect to announced false /
unavailable ip+port.
Note: this PR deprecates `--cluster_announce_ip`, but continues to
support it. We will remove it in a future version.
Fixes#3380
* fix failing test
* destructure
1. Add background offloading stats
2. remove direct_fd override - helio is already updated with default=false, so it's not needed anymore.
3. remove redundant tiered_storage_memory_margin flag
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
DastTable::Traverse is error prone when the callback passed preempts because the segment might change. This is problematic and we need atomicity while traversing segments with preemption. The fix is to add Traverse in DbSlice and protect the traversal via ThreadLocalMutex.
* add ConditionFlag to DbSlice
* add Traverse in DbSlice and protect it with the ConditionFlag
* remove condition flag from snapshot
* remove condition flag from streamer
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
Update the flag for extreme testing. We should remove this before the release.
* set serialization_max_chunk_size to 1 byte
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
The problem is that the test test_big_value_serialization_memory_limit will try to shutdown dragonfly at the end with a timeout of 15 seconds. Dragonfly during shutdown takes a snapshot which might take more than 15 seconds and the test fails.
* call flushall before we exit the test
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* fix: Fix `test_take_over_seeder`
There are a few issues with the test:
1. Not using the admin port, which could cause pause to deadlock
2. Not waiting for some of the `task`s (although that won't cause a
failure)
But also in the product code:
1. We used to `std::move()` the same pointer multiple times
2. We assigned to the same status object from multiple threads
Hopefully this fixes the test. It used to fail every ~100 attempts on my
machine, now it's been >1,000 and they all passed.
* add comments
* remove shard_ptr param
* default serialization_max_chunk_size to 10 mb
* add test for big values
* small rename of enum to conform style guide
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* chore: fix test_parser_memory_stats flakiness
1. Added a robust assert_eventually decorator for pytests
2. Improved the assertion condition in TieredStorageTest.BackgroundOffloading
3. Added total_uploaded stats for tiering that tells how many times offloaded values
were promoted back to RAM.
* chore: skip test_cluster_fuzzymigration
Leave only connection memory usage in memory stats.
We should think how we can move it also to /metrics.
In addition, added a test verifying that redis parser memory
usage is tracked.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix replication test flag name for big values
* fix a bug that triggers ub when RegisterOnChange is called on flows that iterate over the callbacks and preempt
* add a stress test for big value serialization
Signed-off-by: kostas <kostas@dragonflydb.io>
* serialize big slots in chunks
* allow preemption on large slots
* disable big entries serialization for RDB files
* add test
Signed-off-by: kostas <kostas@dragonflydb.io>
* feat(namespaces): Initial support for multi-tenant #3050
This PR introduces a way to create multiple, separate and isolated
namespaces in Dragonfly. Each user can be associated with a single
namespace, and will not be able to interact with other namespaces.
This is still experimental, and lacks some important features, such as:
* Replication and RDB saving completely ignores non-default namespaces
* Defrag and statistics either use the default namespace or all
namespaces without separation
To associate a user with a namespace, use the `ACL` command with the
`TENANT:<namespace>` flag:
```
ACL SETUSER user TENANT:namespace1 ON >user_pass +@all ~*
```
For more examples and up to date info check
`tests/dragonfly/acl_family_test.py` - specifically the
`test_namespaces` function.
* fix: properly clean tiered state upon flash
The bug was around io pending entries that have not been properly cleaned during flush.
This PR simplified the logic around tiered storage handling during flush, it always performs the
cleaning in the synchronous part of the command.
In addition, this PR improves error logging in tests if dragonfly process exits with an error.
Finally, a test is added that makes sure pending tiered items are flushed during the flash call.
Fixes#3252
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: introduce back-pressure to tiered storage
Also, so clean-up with mac-os daily build.
Enabled forgotten test.
Improve CI insights
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* add support for multiple passwords
* add support for deleting passwords
* add support for resetpass
* add tests
* always prefix passwords with hashtag when printed
That was a misleading name, as the logic was the exact opposite (oops 🤦)
This PR introduces a new name for the same flag: break_replication_on_master_restart
We're keeping the previous flag for now, to make transition easier. We'll remove it in a later Dragonfly version (>= 1.22)
Fixes#3192
* fix(cluster): Support `FLUSHALL` while slot migration is in progress
Fixes#3132
Also do a small refactor to move cancellation logic into
`RestoreStreamer`.
* print categories and commands in lower case instead of capital case
* fix a bug of default user inheriting the wrong acl rules on new connections
* move keys position to be after password when printed from an acl command
* remove acl categories from context and all acl checks
* category assign,ent now assigns all the acl commands for that category to the user
* introduce modification order of acl's per user
* acl rules are now printed in the same order as in redis/valkey
* remove old user_registry_test which was part of the poc
* chore: Introduce pipeline back-pressure
Also, improve synchronization primitives and replace them with
thread-local variations.
Before the change, on my local machine with the dragonfly running with 8 threads,
`memtier_benchmark -c 10 --threads 8 --command="PING" --key-maximum 100000000 --hide-histogram --distinct-client-seed --pipeline=20 --test-time=10`
reached 10M qps with 0.327ms p99.9.
After the change, the same command showed 13.8M qps with 0.2ms p99.9
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix(cluster-migration): Support cancelling migration right after starting it
This fixes a few small places, but most importantly it does not allow a
migration to start before both the outgoing and incoming side received
the updated config. This solves a few edge cases.
Fixes#2968
* add TODO
* fix test
* gh comments and fixes
* add comment
* change ACL DELUSER, ACL WHOAMI, and some ACL DRYRUN string/integer responses.
* change ACL GETUSER response, when the user does not exist, it should reply (nil).
* chore: clean up REPLTAKEOVER flow
1. Factor out the catchup function.
2. Simplify the flow and make the second parameters - integer.
3. Return OK if the server is already a master (and do nothing underneath).
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The number of keys in an _incoming_ migration indicates how many keys
were received, while for _outgoing_ it shows the total number. Combining
the two can provide the control plane with percentage.
This slightly modified the format of the response.
Fixes#2756
fix: authorize the http connection to call DF commands
The assumption is that basic-auth already covers the authentication part.
And thanks to @sunneydev for finding the bug and providing the tests.
The tests actually uncovered another bug where we may parse partial http requests.
This one is handled by https://github.com/romange/helio/pull/243
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Send journal lsn to replica and compare the lsn value against number of records received in replica side
Signed-off-by: kostas <kostas@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
* chore: preparation for basic http api
The goal is to provide very basic support for simple commands,
fancy stuff like pipelining, blocking commands won't work.
1. Added optional registration for /api handler.
2. Implemented parsing of post body.
3. Added basic formatting routine for the response. It does not cover all the commands but should suffice for
basic usage.
The API is a POST method and the body of the request should contain command arguments formatted as json array.
For example, `'["set", "foo", "bar", "ex", "100"]'`.
The response is a json object with either `result` field holding the response of the command or
`error` field containing the error message sent by the server.
See `test_http` test in tests/dragonfly/connection_test.py for more details.
* chore: cover iouring with enable_direct_fd
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(replication): Do not auto replicate different master
Until now, replicas would re-connect and re-replicate a master after the
master will restart. This is problematic in case the master loses its
data, which will cause the replica to flush all and lose its data as
well.
This is a breaking change though, in that whoever controls the replica
now has to explicitly issue a `REPLICAOF X Y` in order to re-establish
a connection to a new master. This is true even if the master loaded an
up to date RDB file.
It's not necessary if the replica lost connection to the master and the
master was always alive, and the connection is re-established.
Fixes#2636
* fix test
* fixes
* proxy proxy java java
* better comment
* fix comments
* replica_reconnect_on_master_restart
* proxy.close()
* feat(cluster): Add `--cluster_id` flag
This flag sets the unique ID of a node in a cluster.
It is UB (and bad) to set the same IDs to multiple nodes in the same
cluster.
If unset (default), the `master_replid` (previously known as `master_id`) is used.
Fixes#2643
Related to #2636
* gh comments
* oops - revert line removal
* fix
* replica
* disallow cluster_node_id in emulated mode
* fix replica test
1.Add back the search files to MacOs build (linker errors are fixed now).
2. Add default maxmemory argument (if not present already) when launching dragonfly process in regression tests.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: Del and NUMINCRBY use json::Path
Also, fix various protocol bugs when we sent simple string
instead of sending bulk strings.
Fixed a typo in path.cc that lead to a data race bug.
Finally, flip the flag in regression tests to start covering json::Path code
and added test coverage for the data race bug
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(connection): Support pipelining with Memcached
Adds support for pipelining to Memcached, enhances Memcached pytests
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* upload only failed test logs
* remove printing log names for passed tests
* print slow tests with --duration
* separate regression and unit logs for CI workflow
* feat(pytest): More types for seeder
Add more types to the seeder and refactor replication test
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
The bug: crash when starting replica while saving
The problem: accessing the wrong allocator on snapshot class destruction as it was destructed not in the thread of the shard
The fix: call snapshot destructor when we finish snapshot on the correct thread
Signed-off-by: adi_holden <adi@dragonflydb.io>
* fix: do not migrate during connection close
Fixes#2569
Before the change we had a corner case where Dragonfly would call
OnPreMigrateThread but would not call CancelOnErrorCb because OnBreakCb has already been called
(it resets break_cb_engaged_)
On the other hand in OnPostMigrateThread we called RegisterOnErrorCb if breaker_cb_ which resulted in double registration.
This change simplifies the logic by removing break_cb_engaged_ flag since CancelOnErrorCb is safe to call if nothing is registered.
Moreover, we now skip Migrate flow if a socket is being closed.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* test(memory): Test memory accounting for all types
* slightly faster
* WIP
* working
* Document
* Update test to use DEBUG POPULATE
* Nothing much
* Working
* fix
* yaml
* explicit capture
* fix ci?
* stub tx
* feat(cluster): add tx execution in cluster_shard_migration
refactor(replication): move code that is common for cluster and
replica into a separate file, add full-sync-cut cmd
* fix(replication): Correctly replicate commands even when OOM
Before this change, OOM in shard callbacks could have led to data
inconsistency between the master and the replica. For example, commands
which mutated data on 1 shard but failed on another, like `LMOVE`.
After this change, callbacks that result in an OOM will correctly
replicate their work (none, partial or complete) to replicas.
Note that `MSET` and `MSETNX` required special handling, in that they are
the only commands that can _create_ multiple keys, and so some of them
can fail.
Fixes#2381
* fixes
* test fix
* RecordJournal
* UNDO idiotnessness
* 2 shards
* fix pytest
* feat(server): Implement `CLIENT KILL`
Currently, it supports the following syntax:
* `CLIENT KILL <addr>:<port>`
* `CLIENT KILL ID <id>`
* `CLIENT KILL ADDR <addr>:<port>`
* `CLIENT KILL LADDR <addr>:<port>`
It will not allow killing an admin-connection from a non-admin port.
There are a few parameters of `CLIENT KILL` that Redis supports but this
PR does not yet add. Let's add them as needed.
Fixes#1614
* Add tests
* fixes
fixes#2296
added a regression test that tests both policy based eviction as well as heart beat eviction.
---------
Signed-off-by: Yue Li <61070669+theyueli@users.noreply.github.com>
* feat: add SLOT-MIGRATION-STATUS cmd for source node
implements #2232
add ability using SLOT-MIGRATION-STATUS without args
to print info about all migration processes for the current node
fix#2337
The bug:
replicaof was not rejected while loading snapshot
The fix:
replicaof is allowed while server is in loading state to allow replicaof while replication in full sync mode
I now reject replicaof if the server is in loading state and it is master
Another bug fix:
allow cron snapshot if --replicaof flag was set
Signed-off-by: adi_holden <adi@dragonflydb.io>
* refactor(server): Privatize `PreUpdate()` and `PostUpdate()`
While at it:
* Make `PreUpdate()` not decrease object size
* Remove redundant leftover call to `PreUpdate()` outside `DbSlice`
* Add pytest
* Test delete leads to 0 counters
* Improve test
* fixes
* comments
1. How many transactions we processed by type
2. How many transactions we processed by width (number of unique shards).
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(cluster): add command flow for slot migration process
fixes#2295
DFLYMIGRATE FLOW command was added to establish
connections for every shard replication process.
Slow serialization step is the separate issue so
for now only eof_token is sent for reply to
DFLYMIGRATE FLOW command.
Expected state for START-SLOT-MIGRATION is FULL_SYNC now.
* feat: DispatchTracker
Use a DispatchTracker to track ongoing dispatches for commands that change global state
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
fix: eliminate the redundant string copy in SendMGetResponse
Also, allow selectively create DflyInstance in pytests that is attached to
an existing dragonfly port, created outside of tests.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The DF version is being unparseable by Memcached::getVersion() that expects n.n.n string.
Change the version to emulate the old memcached server.
The DF version can still be fetched via Memcached::getStats() function.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>