Update the flag for extreme testing. We should remove this before the release.
* set serialization_max_chunk_size to 1 byte
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
The problem is that the test test_big_value_serialization_memory_limit will try to shutdown dragonfly at the end with a timeout of 15 seconds. Dragonfly during shutdown takes a snapshot which might take more than 15 seconds and the test fails.
* call flushall before we exit the test
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
Inline transactions do not acquire any locks and therefore they should not preempt. This is no longer true when db_slice has registered callbacks.
* disable inline transactions when db_slice has registered callbacks
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
For big value serialization it is required to support preemption when db_slice::RegisterOnChange is called to avoid UB when a code path is iterating over the change_cb_ and preempts because it serializes a big value. As this is problematic and can lead to data inconsistencies I replace the std::vector with std::list and bound the iteration of change_cb_ on paths that preempt.
* replace std::vector with std::list for change_cb_
* bound iteration of change_cb_ on paths that preempt
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
We might preempt when we serialize a big value and the code in journal was protected by an atomic guard triggering a check failed.
* remove fiber guard from non atomic section
* move LocalBlockingCounter to common
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* chore: reenable evictions upon insertion to avoid OOM rejections
Before: when running dragonfly with --cache_mode we could get OOM rejections
even though the eviction policy allowed to evict items to free memory.
Ideally, dragonfly in cache mode should not respond with the OOM error.
This PR reuses the same Eviction step we have in the Heartbeat and conditionally applies it
during the insertion. In my test the OOM errors went from 500K to 0 and the server
still respected memory limit.
Also, remove the old heuristics that has never been used.
Test:
./dfly_bench --key_prefix=bar: -d 1024 --ratio=1:0 --qps=200 -n 3000
./dragonfly --dbfilename= --proactor_threads=2 --maxmemory=600M --cache_mode
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: Fix `test_take_over_seeder`
There are a few issues with the test:
1. Not using the admin port, which could cause pause to deadlock
2. Not waiting for some of the `task`s (although that won't cause a
failure)
But also in the product code:
1. We used to `std::move()` the same pointer multiple times
2. We assigned to the same status object from multiple threads
Hopefully this fixes the test. It used to fail every ~100 attempts on my
machine, now it's been >1,000 and they all passed.
* add comments
* remove shard_ptr param
* chore: introduce a cool queue that gradually retires cool items
This PR introduces a new state in which the offloaded value is not freed from memory but instead stays
in the cool queue.
Upon Read we convert the cool value back to hot table and delete it from storage.
When we low on memory we retire oldest cool values until we are above the threshold.
The PR does not fully finish the feature but it is workable enough to start (load)testing.
Missing:
a) Handle Modify operations
b) Retire cool items in more cases where we are low on memory. Specifically, refrain from evictions as long as cool items exist.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* default serialization_max_chunk_size to 10 mb
* add test for big values
* small rename of enum to conform style guide
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* chore: simplify computation of used_mem_current
Before - each thread updated its own variable and then,
the global "used_mem_current" was updated by summing used memory from each thread.
Now, each thread updates used_mem_current directly. The code is simpler and also provides more precise
results more frequently.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Also, add the according API to compact object.
Now external objects can be in two states: Cool and Offloaded.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Our area of attack during concurrent transaction access is the call to DisarmInShard and DisarmInShardWhen, which only access is_armed - an atomic varible. It is not safe to arbitrarily call GetNamespace() if we write to it in InitBase
Solution: Don't write to it post first initialization
* chore: fix test_parser_memory_stats flakiness
1. Added a robust assert_eventually decorator for pytests
2. Improved the assertion condition in TieredStorageTest.BackgroundOffloading
3. Added total_uploaded stats for tiering that tells how many times offloaded values
were promoted back to RAM.
* chore: skip test_cluster_fuzzymigration
1. Moved CommandGenerator to thread scope - there is no need to maintain separate command generator per connection.
2. Added "done" metric - to know how much was done so far.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: clean up TaskQueue since we do not need multiple fibers for it
Implement TaskQueue as a wrapper around FiberQueue.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Before it was possible to issue several concurrent AsyncWrite requests.
But these are not atomic, which leads to replication stream corruption.
Now we wait for the previous request to finish before sending the next one.
ThrottleIfNeeded is now takes into account pending buffer size for throttling.
Fixes#3329
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Leave only connection memory usage in memory stats.
We should think how we can move it also to /metrics.
In addition, added a test verifying that redis parser memory
usage is tracked.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix replication test flag name for big values
* fix a bug that triggers ub when RegisterOnChange is called on flows that iterate over the callbacks and preempt
* add a stress test for big value serialization
Signed-off-by: kostas <kostas@dragonflydb.io>
* feature(hset_family): Add NX option to HSETEX
fixes dragonflydb#3265
Signed-off-by: Stepan Bagritsevich <bagr.stepan@gmail.com>
* refactor(hset_family): Fix returned value in the HSETEX command
Signed-off-by: Stepan Bagritsevich <bagr.stepan@gmail.com>
* refactor: Revert the changes of the returned value for the HSETEX command
Signed-off-by: Stepan Bagritsevich <bagr.stepan@gmail.com>
---------
Signed-off-by: Stepan Bagritsevich <bagr.stepan@gmail.com>
* serialize big slots in chunks
* allow preemption on large slots
* disable big entries serialization for RDB files
* add test
Signed-off-by: kostas <kostas@dragonflydb.io>
We divide the keyspace between connections in advance.
This allows easily cover chunks of a key space in a predictable manner without having overlapping traffic.
Excess traffic will just wrap around.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>