dragonflydb-dragonfly

mirror of https://github.com/dragonflydb/dragonfly.git synced 2024-12-14 11:58:02 +00:00

Author	SHA1	Message	Date
Borys	071e299971	refactor: remove redundant allocations for streamer (#4225 ) * refactor: remove redundant allocations for streamer	2024-12-05 08:15:31 +00:00
adiholden	7a23ec2aac	fix(server): fix memory leak on lua error (#4236 ) The bug: calling lua_error does not return, instead it unwinds the Lua call stack until an error handler is found or the script exits. This lead to memory leak on object that should release memory in destructor. Specific example is the absl::FixedArray<string_view, 4> args(argc); which allocates on heap if argc > 4. The free was not called leading to memory leak. The fix: Add scoping to to the function so that the destructor is called before calling raise error Signed-off-by: adi_holden <adi@dragonflydb.io>	2024-12-03 16:47:43 +02:00
Shahar Mike	b0d633fb61	chore: Hide replica info in real cluster if `--managed_service_info` (#4241 ) So far we only handled emulated cluster. This PR adds real cluster support. Related to #4173	2024-12-02 20:22:07 +02:00
Shahar Mike	779bba71f9	fix: Fix `test_network_disconnect_during_migration` test (#4224 ) There are actually a few failures fixed in this PR, only one of which is a test bug: * `db_slice_->Traverse()` can yield, causing `fiber_cancelled_`'s value to change * When a migration is cancelled, it may never finish `WaitForInflightToComplete()` because it has `in_flight_bytes_` that will never reach destination due to the cancellation * `IterateMap()` with numeric key/values overrode the key's buffer with the value's buffer Fixes #4207	2024-12-02 15:55:23 +02:00
Borys	dc04b196d5	test: fix and unskip test_migration_timeout_on_sync (#4216 )	2024-11-28 14:54:17 +02:00
Borys	d6f2b76666	fix: cluster_mgr script (#4210 )	2024-11-27 14:09:19 +00:00
Kostas Kyrimis	66e0fd0908	fix: stream memory tracking (#4067 ) * add object memory usage for streams * add test	2024-11-27 12:41:08 +02:00
Borys	0531c39aae	test: skip test_cluster_mgr because of unclosed instance (#4191 )	2024-11-26 09:34:06 +00:00
Roman Gershman	2c663f3833	chore: produce core files in regtests (#4185 ) Should work only for self-hosted runners. The core files will be kept in /var/crash/ We also copy automatically dragonfly binary into /var/crash to be able to debug later. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-25 17:13:11 +00:00
Roman Gershman	63742dd0cf	fix: stop using openssl for container healthchecks (#4181 ) Dragonfly responds to ascii based requests to tls port with: `-ERR Bad TLS header, double check if you enabled TLS for your client.` Therefore, it is possible to test now both tls and non-tls ports with a plain-text PING. Fixes #4171 Also, blacklist the bloom-filter test that Dragonfly does not support yet. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-25 17:41:17 +02:00
Borys	43c83d29fa	feat: cluster migrations restarts immediately if timeout happens (#4081 ) * feat: cluster migrations restarts immediately if timeout happens * feat: add DEBUG MIGRATION PAUSE command	2024-11-25 16:02:22 +02:00
Shahar Mike	3c65651c69	feat: Huge values breakdown in cluster migration (#4144 ) * feat: Huge values breakdown in cluster migration Before this PR we used `RESTORE` commands for transferring data between source and target nodes in cluster slots migration. While this _works_, it has a side effect of consuming 2x memory for huge values (i.e. if a single key's value takes 10gb, serializing it will take 20gb or even 30gb). With this PR we break down huge keys into multiple commands (`RPUSH`, `HSET`, etc), respecting the existing `--serialization_max_chunk_size` flag. Part of #4100	2024-11-25 15:58:18 +02:00
Shahar Mike	6a7f345bc5	chore: Hide replicas from `CLUSTER` subcmds in managed mode (#4174 ) * chore: Hide replicas from `CLUSTER` subcmds in managed mode Part of #4173 (see for context) * server.client()	2024-11-24 13:10:32 +00:00
Roman Gershman	91caa940b9	chore: fix shutdown sequence in Dragonfly server (#4168 ) 1. Better logging in regtests 2. Release resources in dfly_main in more controlled manner. 3. Switch to ignoring signals when unregister signal handlers during the shutdown. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-24 10:35:00 +02:00
Roman Gershman	b8c2dd888a	chore: log exit code of failing dragonfly in tests (#4166 ) Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-22 11:40:10 +02:00
Shahar Mike	24a1ec6ab2	fix: Huge entries fail to load outside RDB / replication (#4154 ) * fix: Huge entries fail to load outside RDB / replication We have an internal utility tool that we use to deserialize values in some use cases: * `RESTORE` * Cluster slot migration * `RENAME`, if the source and target shards are different We [recently](https://github.com/dragonflydb/dragonfly/issues/3760) changed this area of the code, which caused this regression as it only handled RDB / replication streams. Fixes #4143	2024-11-20 14:00:07 +00:00
Roman Gershman	0e7ae34fe4	fix: enforce load limits when loading snapshot (#4136 ) * fix: enforce load limits when loading snapshot Prevent loading snapshots with used memory higher than max memory limit. 1. Store the used memory metadata only inside the summary file 2. Load the summary file before loading anything else, and if the used-memory is higher, abort the load. --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-20 06:12:47 +02:00
Roman Gershman	d467a348ac	fix: allow SELECT in multi/exec if it's a noop (#4146 ) Fixes #4120 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-18 22:27:34 +02:00
Daniel M	d241839cff	chore:update fakeredis, remove irrelevant tests (#4014 ) * chore:update fakeredis, remove irrelevant tests	2024-11-17 20:24:46 +02:00
Roman Gershman	8e3b8ccbe3	chore: run tests with list_experimental_v2 enabled (#4112 ) Also fix issues with memory_test.py running locally. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-15 10:33:45 +01:00
Borys	4bc9ad6f01	test: add test for snapshoting during migration (#4108 ) * test: add test for snapshotting during migration * test: add test to run replication after migration	2024-11-13 13:40:00 +02:00
Kostas Kyrimis	91c236ab2f	fix: slow CI tests (#4117 ) * refactor test_big_value_serialization * remove no needed replication tests for big value --------- Signed-off-by: kostas <kostas@dragonflydb.io>	2024-11-13 10:32:31 +02:00
Roman Gershman	fa8f3f5564	fix: regression in squashing code when determining eval commands (#4116 ) The regression was caused by #3947 and it causes crashes in bullmq. It has not been found till now because python client sends commands in uppercase. Fixes #4113 Signed-off-by: Roman Gershman <roman@dragonflydb.io> Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io>	2024-11-11 19:54:47 +00:00
Roman Gershman	1eef773d0a	fix: test_noreply_pipeline flakiness (#4102 ) Fixes #3896. Now we retry several times. In my checks this should significantly reduce the failure probability. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-10 13:39:24 +02:00
adiholden	ae3faf59fb	feat(server): dont use channel for replication / save df (#4041 ) * feat server: dont use channel for replication / save df Signed-off-by: adi_holden <adi@dragonflydb.io>	2024-11-05 16:50:01 +02:00
dependabot[bot]	54c67a9198	chore(deps): bump pytest-repeat from 0.9.1 to 0.9.3 in /tests/dragonfly (#4057 ) Bumps [pytest-repeat](https://github.com/pytest-dev/pytest-repeat) from 0.9.1 to 0.9.3. - [Release notes](https://github.com/pytest-dev/pytest-repeat/releases) - [Changelog](https://github.com/pytest-dev/pytest-repeat/blob/main/CHANGES.rst) - [Commits](https://github.com/pytest-dev/pytest-repeat/compare/v0.9.1...v0.9.3) --- updated-dependencies: - dependency-name: pytest-repeat dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 23:47:24 +02:00
dependabot[bot]	94ee96afd0	chore(deps): bump redis-om from 0.2.2 to 0.3.3 in /tests/dragonfly (#4060 ) Bumps [redis-om](https://github.com/redis/redis-om-python) from 0.2.2 to 0.3.3. - [Release notes](https://github.com/redis/redis-om-python/releases) - [Commits](https://github.com/redis/redis-om-python/compare/v0.2.2...v0.3.3) --- updated-dependencies: - dependency-name: redis-om dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 22:35:18 +02:00
dependabot[bot]	b70f9703f4	chore(deps): bump tomli from 2.0.1 to 2.0.2 in /tests/dragonfly (#4059 ) Bumps [tomli](https://github.com/hukkin/tomli) from 2.0.1 to 2.0.2. - [Changelog](https://github.com/hukkin/tomli/blob/master/CHANGELOG.md) - [Commits](https://github.com/hukkin/tomli/compare/2.0.1...2.0.2) --- updated-dependencies: - dependency-name: tomli dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 22:31:29 +02:00
Roman Gershman	cc6fdd7fbf	chore: add retry to test_noreply_pipeline test (#4045 )	2024-11-04 08:32:48 +00:00
Borys	e4b468d953	fix: reduce memory consumption during migration (#4017 ) * refactor: reduce memory consumption for RestoreStreamer * fix: add Throttling into RestoreStreamer::WriteBucket	2024-11-03 17:03:45 +02:00
Borys	5a597cf6cc	test: update test_big_containers (#4025 )	2024-11-03 16:20:11 +02:00
Roman Gershman	c8b56b69b4	chore: print info stats if test_noreply_pipeline fails (#4016 )	2024-10-30 08:17:15 +00:00
Kostas Kyrimis	4b495182e8	fix: separate Heartbeat and ShardHandler to fibers (#3936 ) * separate shard_handler from Heartbeat * add test --------- Signed-off-by: kostas <kostas@dragonflydb.io>	2024-10-29 09:22:53 +02:00
Roman Gershman	41d8c66a15	fix: flaky test_failover test (#4007 ) Prevent from checking replica too early, to avoid "Loading" exception. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-10-28 14:58:36 +02:00
Borys	c80d21fcba	fix: crash if we OOM during migration process (#3968 )	2024-10-23 17:04:08 +03:00
Stepan Bagritsevich	ea9dc9c454	chore(fakeredis): Enable JSON tests in the Fakeredis tests (#3773 ) * chore(fakeredis): Enable JSON tests in the Fakeredis tests fixes dragonflydb#3671 Signed-off-by: Stsiapan Bahrytsevich <stefan@dragonflydb.io> * refactor: address comments Signed-off-by: Stsiapan Bahrytsevich <stefan@dragonflydb.io> * tmp commit Signed-off-by: Stsiapan Bahrytsevich <stefan@dragonflydb.io> * refactor: address comments 2 Signed-off-by: Stsiapan Bahrytsevich <stefan@dragonflydb.io> --------- Signed-off-by: Stsiapan Bahrytsevich <stefan@dragonflydb.io>	2024-10-23 08:53:14 +02:00
Kostas Kyrimis	37fec87070	chore: increase load in test_noreply_pipeline (#3960 ) Test is flaky because it relies that the producer (the pytest) to send fast enough a bunch of commands before they get dispatched synchronously so I increased the load.	2024-10-22 21:00:25 +03:00
Borys	dec0712e15	test: add test to test big collections or collections with big values (#3959 )	2024-10-22 15:01:32 +03:00
Kostas Kyrimis	119723316e	chore: tune test_rss_used_mem_gap (#3958 ) It appears that newer versions of the gh runner require more memory. Some cases of the test test_rss_used_mem_gap allocate more than 6.5-7 gb of memory leaving barely 0.5gb to the gh runner (7.5 in total available) which sometimes cause the instance to run out of memory.	2024-10-22 10:31:13 +03:00
Kostas Kyrimis	478a5d476d	chore: disable test_cluster_memory_consumption_migration (#3948 ) Test takes more than 10 minutes on the CI and it causes it to timeout	2024-10-21 08:41:15 +03:00
Vladislav	32a31cf1d8	chore(facade): Fix bad new IO glue (#3940 ) * chore(facade): Fix bad new IO glue --------- Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>	2024-10-18 23:25:56 +03:00
Borys	866c82a3fa	test: add test to reproduce a lot of memory consumtion during migration (#3939 )	2024-10-17 14:23:26 +03:00
Shahar Mike	c3f9ec18ae	fix: Fix `test_flushall_in_full_sync` (#3929 ) * fix: Fix `test_flushall_in_full_sync` This test failed in CI many times. The issue was that we reach stable sync too quickly, and miss the full sync stage. I changed the seeder to add 100k (instead of 30k) keys for the stage to take longer. * StaticSeeder	2024-10-15 11:48:32 +00:00
adiholden	a1830e1b5e	feat(server): use listpack node encoding for list (#3914 ) Signed-off-by: adi_holden <adi@dragonflydb.io>	2024-10-15 13:55:26 +03:00
Shahar Mike	c868b27bbe	fix: Support replicating Valkey and Redis 7.2 (#3927 ) Until now, we only tested Dragonfly against Redis 6.2. It appears that something has changed in the way Redis sends stable sync commands, and now they also forward `MULTI` and `EXEC` as part of their replication. Since we do not allow all commands to run under `MULTI`/`EXEC`, specifically `SELECT`, a Dragonfly replica of such servers failed these commands and became inconsistent with the data on the master. The proposed fix is to simply ignore (i.e. not execute) `MULTI`/`EXEC` coming from a Redis/Valkey master, and run the commands within those transactions individually, like we do for other transactions. To test this we randomly choose a redis/valkey server based on 3 available installed binaries and test against them.	2024-10-15 13:12:16 +03:00
Kostas Kyrimis	588d6cc339	chore: relax assertion in test_noreply_pipeline (#3908 ) *adjust assert condition Signed-off-by: kostas <kostas@dragonflydb.io>	2024-10-14 09:27:57 +03:00
Vladislav	e71f083f34	feat(search): STOPWORDS (#3851 ) Adds support for STOPWORDS option	2024-10-10 21:58:12 +03:00
Kostas Kyrimis	a5fa3ab9f5	chore: skip flaky test_noreply_pipeline (#3903 ) * Disable the test_noreply_pipeline because it's really flaky. Will look on this once I wrap up with my pending tasks.	2024-10-10 07:37:13 +00:00
Vladislav	786c9cd44d	chore: collection size (#3844 )	2024-10-08 18:51:11 +03:00
Shahar Mike	b2ebfd05d4	fix: Do not publish to connections without context (#3873 ) * fix: Do not publish to connections without context This is a rare case where a closed connection is kept alive while the handling fiber yields, therefore leaving `cc_` (the connection context) pointing to null for other fibers to see. As far as I can see, this can only happen during server shutdown, but there could be other cases that I have missed. The test on its own does _not_ reproduce the crash, however with added `ThisFiber::SleepFor()`s I could reproduce the crash: * Right before `DispatchBrief()` [here](`e3214cb603/src/server/channel_store.cc (L154)`) * Right after connection context `reset()` [here](`2ab480e160/src/facade/dragonfly_connection.cc (L750)`) In any case, calling `SendPubMessageAsync()` to a connection where `cc_` is null is a bug, and we fix that here. * rewording	2024-10-08 14:45:57 +03:00

1 2 3 4 5 ...

636 commits