1
0
Fork 0
mirror of https://github.com/dragonflydb/dragonfly.git synced 2024-12-15 17:51:06 +00:00
Commit graph

2652 commits

Author SHA1 Message Date
romange
734be21407 chore(helm-chart): update to v1.23.0 2024-09-25 09:58:17 +00:00
Lakshya Garg
fb2ee90b2d
chore(acl_family): add allcomands and nocommands (#3783)
* add allcommands alias for acl
* add nocommands alias for acl
* add test
2024-09-25 10:58:33 +03:00
Kostas Kyrimis
105c2bd761
fix: bitop do not add dst key if result is empty (#3751)
* fix bitiop creating the dst key if result is empty
* fix replicating dst with the wrong type
* make bitop a blind update (similar to set command)

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-25 09:45:20 +03:00
Borys
987e6feaa5
fix: GETRANGE params validation (#3781)
fix: getrange params validation
2024-09-24 13:54:35 +00:00
Shahar Mike
526bce4222
chore: Forbid replicating a replica (#3779)
* chore: Forbid replicating a replica

We do not support connecting a replica to a replica, but before this PR
we allowed doing so. This PR disables that behavior.

Fixes #3679

* `replicaof_mu_`
2024-09-24 13:42:22 +00:00
Shahar Mike
9aadc0cd2b
fix: Fix flaky test test_acl_revoke_pub_sub_while_subscribed (#3768)
fix: Fix flaky test `test_acl_revoke_pub_sub_while_subscribed`

The reason it failed is that, in some rare cases, the subscriber did not
get the first few messages of the publisher. This is likely due to
timing of subscribe and publish, in different connections / threads.

Given Pub/Sub has very weak guarantees, it's probably ok as is, so I
just added a sleep to get the test to pass always.
2024-09-24 11:47:17 +03:00
Borys
3804076ea9
fix: setrange with empty value doesn't modify the DB (#3771) 2024-09-23 19:09:53 +03:00
Roman Gershman
b7b4cabacc
chore: some renames + fix a typo in RETURN_ON_BAD_STATUS (#3763)
* chore: some renames + fix a typo in RETURN_ON_BAD_STATUS

Renames in transaction.h - no functional changes.
Fix a typo in error.h following  #3758
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-23 13:16:50 +03:00
Borys
9303591010
fix: mark pubusb commands as unsupported for cluster (#3767) 2024-09-23 09:59:13 +00:00
Roman Gershman
9c49aee43d
chore: give up on InlinedVector due to spurious warnings with optional (#3765) 2024-09-23 11:34:39 +03:00
adiholden
7df95dfb6e
fix server: fix last error reply (#3728)
fix 1: in multi command squasher error message was not set therefore it was not printed to log on the relevant command only on exec, fixed by setting the last error in CapturingReplyBuilder::SendError
fix 2: non clearing cached error replies before the command is Invoked

---------

Signed-off-by: adi_holden <adi@dragonflydb.io>
Co-authored-by: kostas <kostas@dragonflydb.io>
2024-09-23 11:34:13 +03:00
Andy Dunstall
45ffc605bd
feat(zset_family): add ZRANGESTORE (#3757) 2024-09-23 11:28:12 +03:00
Borys
6185617949
fix: substr/getrange result for invalid range (#3766) 2024-09-23 08:20:08 +00:00
Roman Gershman
0a049ab631
chore: add more error logs around ziplist parsing checks (#3764)
Also, reformat ziplist.c to valkey 8 formatting (no code changes besides this).

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-23 10:13:36 +03:00
Kostas Kyrimis
15fce9df2d
chore: logs on assert fail for test_acl_cat_commands_multi_exec_squash (#3749)
* print result if assertion fails

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-23 09:51:58 +03:00
Roman Gershman
29b18f0dcb
fix: tune test_replicaof_reject_on_load parameters (#3762)
Reduce the snapshot size by 20% and increase the timeout to avoid failures due to slow loads.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-23 09:50:10 +03:00
Roman Gershman
f1f8ee17dc
fix: make snapshotting process more responsive (#3759)
* fix: improve BreakStalledFlowsInShard heuristic

Before this change - we wrote in a single call whatever record chunks we pulled from the channel.
This can be problematic for 1GB chunks for example, which might take 10sec to write.

Lately we added a replication breaker on the master side that breaks the fully sync after
a predefined threshold has passed. By default it was 10sec. To improve the robustness of this
breaker, we now write chunks of upto 1MB and update last_write_time_ns_ more frequently.

Also, we added more logs to make replication delays on both sides more visible.
We also added logs of breaking the replication on the master sides.

Unfortunately, this did not help making BreakStalledFlowsInShard more robust because now the
problem moved to replica side which may take 20s+ seconds to parse huge values.
Therefore, I increased the threshold for breaking the replication to 30s.

Finally, instrument GetMetrics call as it takes sometimes more than 1 sec.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-22 17:05:28 +03:00
Roman Gershman
2e9b133ea0
chore: add integrity checks to consumer->pel (#3754)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-22 15:40:42 +03:00
Roman Gershman
e09ebe0c5c
fix: test deadlock with processing the stdout of sed (#3735)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-22 15:40:27 +03:00
adiholden
4d38271efa
feat(server): introduce rss oom limit (#3702)
* introduce rss denyoom limit

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-09-22 13:28:24 +03:00
adiholden
5cf917871c
feat(server): introduce oom_deny_commands flag (#3718)
* server: introduce oom_deny_commands flag

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-09-22 09:32:18 +03:00
Stefan Roman
69db21db4c
feat(helm): add hostNetwork, topologySpreadConstraint and clusterIP su… (#3389)
* add(helm): add hostNetwork, topologySpreadConstraint and clusterIP support

* parameters hostNetwork and clusterIP shouold not be templated if they are not explicitly used

---------

Signed-off-by: Stefan Roman <elegant.frog3113@fastmail.com>
Co-authored-by: Stefan Roman <elegant.frog3113@fastmail.com>
2024-09-22 08:08:01 +03:00
Vladislav
d9f8f2553b
chore: fix return on bad status (#3758) 2024-09-22 01:36:39 +03:00
Roman Gershman
cce2eb35ed
chore: refactor a lambda function into a named one (#3753)
Also did some cosmetic improvements. No functionality should be changed.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-22 01:35:56 +03:00
Andy Dunstall
9dd79657ce
fix: zset store conclude transaction on error (#3755) 2024-09-21 19:08:53 +03:00
Borys
ce79da0f7a
fix: add value range check for SETBIT command (#3750) 2024-09-20 18:20:35 +03:00
Roman Gershman
c9a2334f6d
fix: allow the healthcheck run in non-privileged containers as well (#3731)
fix: allow the healthcheck running in non-privileged containers as well

Fixes #3644 (again).

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-20 05:41:06 +00:00
Kostas Kyrimis
ed21867fe9
chore: add missing await in test_take_over_seeder (#3744)
* add missing await

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-19 17:03:11 +00:00
Roman Gershman
abf3acec4a
chore: introduce a Clone function for the dense set (#3740)
* chore: introduce a Clone function for the dense set

We use a state machine to prefetch data in batches.
After this change, the hot spots are predominantly inside ObjectClone and
Hash methods.

All in all benchmarks show ~45% CPU reduction:
```
BM_Clone/elements:32000    1517322 ns      1517338 ns         2772
BM_Fill/elements:32000      841087 ns       841097 ns         4900
```

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-19 16:14:33 +03:00
Vladislav
3af2dfc4e7
chore: add SetReplies (#3727) 2024-09-19 12:54:25 +03:00
Kostas Kyrimis
0e0b2e78a4
chore: change log level to warning for empty keys (#3722)
* adjust log level to warning for allowed empty keys in rdb_load and rdb_save

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-19 09:45:09 +00:00
Shahar Mike
55e3647248
chore: Switch ports for cluster_mgr_test.py (#3741)
We saw failures due to port already in use
2024-09-19 12:32:31 +03:00
adiholden
409c2a3beb
test: add test for replication deadlock on replication timeout (#3691)
* test: add test for replication deadlock on replication timeout

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-09-19 12:11:28 +03:00
Borys
efa4efd2bf
refactor: use CmdArgParser for XGROUP command (#3739) 2024-09-18 22:30:37 +03:00
Borys
bbaa2669f9
test: unskip test for debugging purpose (#3738) 2024-09-18 14:13:07 +00:00
Borys
f122a19a02
test: add tests for replication (#3734)
* test: add tests for replication
2024-09-18 16:32:21 +03:00
Kostas Kyrimis
6e45c9c3e2
fix: properly track json memory usage (#3641)
* add JsonMemTracker
* add logic based on MiMallocResource deltas that calculates json object usage
* add test

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-18 13:08:43 +00:00
Stepan Bagritsevich
b235617a0d
fix(json_family): Fix out of bound ranges for the JSON.ARR* commands (#3712)
* fix(json_family): Fix out of bound ranges for theJSON.ARR* commands

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor(json_family): address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor(json_family): address comments 2

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
2024-09-18 14:31:17 +02:00
Vladislav
41ba864924
chore: Remove ReqSerializer (#3721)
Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>
2024-09-18 14:31:47 +03:00
Shahar Mike
ffb4c2b601
fix: Fix test_take_over_seeder (#3733)
The test assumed any shutdown will take not more than 1s. This doesn't
always hold, and also waiting for 1s isn't ideal because usually it
takes less than that.

Changed to use `assert_eventually` instead.

Fixes #3684
2024-09-18 14:17:09 +03:00
Shahar Mike
1c6be62a0b
fix: Fix cluster_mgr.py (#3730)
We updated the reply of `SLOT-MIGRATION-STATUS`, so `cluster_mgr.py`
needs to be adjusted as well.
2024-09-18 11:44:15 +03:00
Shahar Mike
a115bc2b9f
fix: Fix test test_client_pause_with_replica (#3729)
There are 2 minor issues with this test:
1. It specified `cmdstat_replconf` as `cmd_stats` instead of `cmd`,
   that's clearly a typo as `cmd_stats` is a map with stats, while
   `replconf` is a Dragonfly command
2. Command `MULTI` is allowed to run even when the server is in paused
   state, see
   [here](https://github.com/dragonflydb/dragonfly/blob/main/src/server/main_service.cc#L1197):

   ```
   // Don't interrupt running multi commands or admin connections.
   ```

Fixes #3675
2024-09-18 09:40:26 +03:00
Stepan Bagritsevich
ae5ce9b497
fix(json_family): Separate double and int values during the comparison of the JSON objects (#3711)
* fix(json_family): Separate the double and int values in JSON commands

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor(json_family): Address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
2024-09-18 07:24:48 +02:00
Stepan Bagritsevich
824af02f6f
fix(json_family): Fix JSON.ARRPOP command in legacy mode should not return WRONGTYPE error (#3683)
* fix(json_family): Fix WRONGTYPE error for the JSON legacy mode in the JSON.ARRPOP command

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor(json_family): address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor(json_family): address comments 2

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
2024-09-18 07:24:18 +02:00
Andy Dunstall
a64fc74ce1
tests: fix and enable s3 snapshot test (#3720)
* test: fix s3 snapshot test

* ci: configure s3 regression test

* tests: only run s3 snapshot test if bucket not empty
2024-09-17 17:35:53 +03:00
Kostas Kyrimis
8a34b3e730
chore: enable ReplyGuard in ReplyBuilder2 (#3705)
* add ReplyGuard in ReplyBuilder2

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-17 13:37:23 +03:00
Kostas Kyrimis
6f84115152
chore: add log info on failed commands (#3694)
* log errors on failed commands

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-17 13:07:46 +03:00
Shahar Mike
51746d99c7
fix(cluster): Do not Pause() replication / migrations (#3716)
Pre-this change, whenever Dragonfly was paused (either by a user or by
internal processes like takeover or slot migration finalization),
migrations and replications were also paused.

This could cause timing issues, which sometime result in migration
failures. Specifically, when 2 nodes have migrations from one to the
other **in parallel** (A->B and B->A), the `Pause()` that happens on A
(which happens because it's a source node) will stop it from processing
incoming traffic from B (incoming because it is also a target node).

If timed correctly, it will be locked until it times out, and so the
migration will fail.

The fix is to prevent replications and migrations from adhering to
`Pause()`s, which I think should not have happened in the first place
because they should use the admin port anyway.

Fixes #3319
2024-09-17 10:47:55 +03:00
Andy Dunstall
b9ff6934e8
fix: fix s3 load snapshot (#3717) 2024-09-17 07:17:24 +01:00
romange
6f3da56e75 chore(helm-chart): update to v1.22.2 2024-09-16 20:06:02 +00:00