1. Better logging in regtests
2. Release resources in dfly_main in more controlled manner.
3. Switch to ignoring signals when unregister signal handlers during the shutdown.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: Huge entries fail to load outside RDB / replication
We have an internal utility tool that we use to deserialize values in
some use cases:
* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different
We [recently](https://github.com/dragonflydb/dragonfly/issues/3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.
Fixes#4143
Fixes#4150. The failure can be reproduced with high probability on ARM via
`pytest dragonfly/replication_test.py -k test_replication_all[df_factory0-mode0-8-t_replicas3-seeder_config3-2000-False]`
Not sure why this barrier is needed but #4146 removes the barrier
which breaks a gentle balance in the code in unexpected way.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: enforce load limits when loading snapshot
Prevent loading snapshots with used memory higher than max memory limit.
1. Store the used memory metadata only inside the summary file
2. Load the summary file before loading anything else, and if the used-memory is higher,
abort the load.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Now error stats show "restrict_denied" instead of "Cannot execute restricted command ..." error.
2. Increased verbosity level when loading a key with expired timestamp.
3. pulled helio with better logs coverage of tls_socket.cc code.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: optimize info command
Info command has a large latency when returning all the sections.
But often a single section is required. Specifically,
SERVER and REPLICATION sections are often fetched by clients
or management components.
This PR:
1. Removes any hops for "INFO SERVER" command.
2. Removes some redundant stats.
3. Prints latency stats around GetMetrics command if it took to much.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* Update src/server/server_family.cc
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
Signed-off-by: Roman Gershman <romange@gmail.com>
* chore: remove GetMetrics dependency from the REPLICATION section
Also, address comments
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: clang build
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
**The problem:**
When in cluster mode, `MOVED` replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in `# Errorstats` as such. For example, in a server that does _not_ own 8k slots, we will aggregate 8k different errors, and their counts (in memory).
This slows down all `INFO` replies, takes a lot of memory, and also makes `INFO` replies very long.
**The fix:**
Use `type` `MOVED` for moved replies, making them all the same under `# Errorstats`
Fixes#4118
* chore: simplify BumpUps deduplication
This pr #2474 introduced iterator protection by
tracking which keys where bumped up during the transaction operation.
This was done by managing keys view set. However, this can be simplified
using fingerprints. Also, fingerprints do not require that the original keys exist.
In addition, this #3241 PR introduces FetchedItemsRestorer that tracks bumped set and
saves it to protect against fiber context switch. My claim is that it's redundant.
Since we only keep the auto-laundering iterators, when a fiber preempts these iterators recognize it
(see IteratorT::LaunderIfNeeded) and refresh themselves anyway.
To summarize: fetched_items_ protect us from iterator invalidation during atomic scenarios,
and auto-laundering protects us from everything else, so fetched_items_ can be cleared in that case.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The regression was caused by #3947 and it causes crashes in bullmq.
It has not been found till now because python client sends commands in uppercase.
Fixes#4113
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io>
chore: implement Erase with a range
Also migrate more unit tests from valkey repo.
Finally, fix OpTrim
All tests `list_family_test --list_experimental_v2` pass.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
chore: implement OpTrim with QList
* fix(search_family): Process wrong field types in indexes for the FT.SEARCH and FT.AGGREGATE commands
fixes #3986
---------
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Fixes#3896. Now we retry several times.
In my checks this should significantly reduce the failure probability.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: change Namespaces to be a global pointer
Before the namespaces object was defined globally.
However it has non-trivial d'tor that is being called after main exits.
It's quite dangerous to have global non-POD objects being defined globally.
For example, if we used LOG(INFO) inside the Clear function , that would crash dragonfly on exit.
Ths PR changes it to be a global pointer.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>