Now this management script can:
* Create a cluster (before this PR)
* Print an existing cluster configuration
* Shutdown an existing cluster
* Move slots between cluster nodes
To support connecting to a cluster (for all new functions), I had to
change the way admin ports are defined. Instead of having the user
(optionally) specify the first port, they are hard-coded to be the
regular port + 10,000. This is done because we can't detect the admin
port based for an existing cluster (like via `CLUSTER SHARDS`).
Currently, During docker release we don't actually build the
alpine release but the new docker run test ends up trying to
run it and fails. This adds the same toggle that we use for
build to prevent the test step.
This commit updates the weekly docker build to use `dragonfly-weekly`
image tag so that we get better separation. We also now push these
images with the github sha commit tags. We also update `latest` as
these get pushed.
fix: remove bad check-fail in the transaction code.
Fixes#1421.
The failure reproduces for dragongly running with a single thread where all the
arguments grouped within the same ShardData
Also, we improve verbosity levels inside reply_builder.cc.
For that we extend SinkReplyBuilder to support protocol errors reporting
and we remove ad-hoc code for this from dragonfly_connection.
Required to track errors easily with `--vmodule=reply_builder=1`
Finally, a pytest is added to cover the issue.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This script allows easily setting up a local cluster.
Example invocation:
```
killall dragonfly; ./cluster_mgr.py --num_masters=3 --with_replicas
Setting up a Dragonfly cluster:
- Master nodes: 3
- Ports: 7001...7003
- Admin ports: 8001...8003
- Replicas? True
Starting nodes...
- Log file for node 7001: /tmp/dfly.cluster.node.7001.log
- Log file for node 7002: /tmp/dfly.cluster.node.7002.log
- Log file for node 7003: /tmp/dfly.cluster.node.7003.log
- Log file for node 7004: /tmp/dfly.cluster.node.7004.log
- Log file for node 7005: /tmp/dfly.cluster.node.7005.log
- Log file for node 7006: /tmp/dfly.cluster.node.7006.log
Configuring replication...
- Response for 7004: OK
- Response for 7005: OK
- Response for 7006: OK
Getting IDs...
- ID for 7001: acefdc2da5d397cfcb99239b3c29cbe6ff10d75a
- ID for 7002: a8cc67dfa42e91a94bd7c0903df35d60a39508bd
- ID for 7003: 1ad91af7bd96c89a8da877164b2ebb4cf458cab8
- ID for 7004: d209c3603343e25a18c78bd68304b6d883973bd3
- ID for 7005: bd2b25e95aaf7fdd2b955e50a00093a8272954bf
- ID for 7006: beb5cb07b75c33e3ff938d07725f2688d9bc91e0
Pushing config...
- Push into 7001: OK
- Push into 7002: OK
- Push into 7003: OK
- Push into 7004: OK
- Push into 7005: OK
- Push into 7006: OK
```
NotifyPending was being called when a blocked transaction expires, which
meant other blocked transactions could be woken up even though another
transaction could be in progress. NotifyPending has no affect on the
blocked transaction.
Signed-off-by: Andy Dunstall <andydunstall@hotmail.co.uk>
1. Use experimental/memory_resource because clang on freebsd does not support yet std::memory_resource
2. Pull latest helio with all the compatibility fixes.
3. Replace linux only SOL_TCP constant with IPPROTO_TCP (same value).
4. Fix build files to work on FreeBsd system.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This removes the "25" limit for batched messages.
Turns out the aggregation in #1287 was not aggressive enough, because
it's quite possible to reach the specified max capacity of io vectors.
For example, each "QUEUED" is actually "+", "QUEUED", "\r\n" so we can
reach the limit with about 8 batched commands and then finish\
aggregating prematurely.
Closes#1285
To support this, I refactored some of the code:
* We no longer have `IsConfigured()` and `SetConfig()`, now
configuration validation is done via instantiation
* As a result, we check if `tl_cluster_config` is `nullptr` or not to
determine whether the cluster has been configured
* Reduce the size of the config by only storing 1 bit per slot (whether it's owned locally or not)
* Pushing new configuration is done via copy-c'tor
While at it, add a small test function to remove `sleep()`s and wait in a less fragile way.
Fixes#1357
Add support for cgroup v1 limit checking
For future reference: testing. In order to test this feature (both for
v1 and v2). To do this, create a cgroup and move Dragonfly into it
before running.
One way to do this is using a script, like: cat; ./dragonfly
--logtostderr
Enabling v1 (by default, you should have v2): Edit `/etc/default/grub`,
and look for `GRUB_CMDLINE_LINUX_DEFAULT`. Append to it
`systemd.unified_cgroup_hierarchy=0`, update GRUB using `sudo
updat-grub`, and reboot.
In v1, create a group `mycgroup` and add PID 1234 to it:
```
sudo mkdir -p /sys/fs/cgroup/memory/mycgroup
sudo bash -c "echo 8589934592 >
/sys/fs/cgroup/memory/mycgroup/memory.limit_in_bytes" # set mem limit to
8G
sudo bash -c "echo 1234 > /sys/fs/cgroup/memory/mycgroup/tasks" # assign
PID 1234 to this cgroup
```
In v2:
```
sudo mkdir -p /sys/fs/cgroup/mycgroup
sudo bash -c "echo 8589934592 > /sys/fs/cgroup/mycgroup/memory.max" #
set mem limit to 8G
sudo bash -c "echo 1234 > /sys/fs/cgroup/mycgroup/cgroup.procs" # assign
PID 1234 to this cgroup
```
then test by (using script example from before):
```
$ ./run.sh # contains: cat; ./dragonfly --logtostderr
[1] 1234
[1] + 1234 suspended (tty input) ./run.sh
fg
^D
<look for Dragonfly memory limit>
```
Signed-off-by: talbii <ido@dragonflydb.io>
This commit adds tracing logs for list commands to debug BLPOP crashes. This commit should be reverted once the issue is resolved
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
In this case, `redis.RedisCluster`.
To be double sure I also looked at the actual packets and saw that the
client asks for `CLUSTER SLOTS`, and then after the redistribution of
slots, following a few `MOVED` replied, it asks for the new slots again.