1
0
Fork 0
mirror of https://github.com/dragonflydb/dragonfly.git synced 2024-12-14 11:58:02 +00:00
Commit graph

115 commits

Author SHA1 Message Date
Borys
d6f2b76666
fix: cluster_mgr script (#4210) 2024-11-27 14:09:19 +00:00
Roman Gershman
63742dd0cf
fix: stop using openssl for container healthchecks (#4181)
Dragonfly responds to ascii based requests to tls port with:
`-ERR Bad TLS header, double check if you enabled TLS for your client.`

Therefore, it is possible to test now both tls and non-tls ports with a plain-text PING.
Fixes #4171

Also, blacklist the bloom-filter test that Dragonfly does not support yet.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-11-25 17:41:17 +02:00
s13k
ff2359af30
fix(tools): Prevent dragonfly.logrotate to stop logrotate service (#4176)
Update dragonfly.logrotate

If multiple logs are being rotated and one of them fails (due to exit 1), the other logs that follow won't be rotated either, unless logrotate is run again.

If you want to prevent the rotation of a specific log file and not affect the rest of the logs, you'll want to handle the condition properly to ensure that logrotate doesn't abort due to the failure of the prerotate script.

To prevent the rotation of a specific log file without causing issues for other logs, you can use exit 0 to prevent rotation cleanly or design your prerotate script to handle conditions carefully.

Signed-off-by: s13k <s13k@pm.me>
2024-11-24 17:27:05 +00:00
Sebastian Struß
cfca3e798d
adjusted grafana dashboard to be more user friendly (#4165) 2024-11-24 09:16:00 +02:00
dependabot[bot]
86b64d910a
chore(deps): bump github.com/redis/go-redis/v9 from 9.5.1 to 9.7.0 in /tools/replay (#4062)
chore(deps): bump github.com/redis/go-redis/v9 in /tools/replay

Bumps [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) from 9.5.1 to 9.7.0.
- [Release notes](https://github.com/redis/go-redis/releases)
- [Changelog](https://github.com/redis/go-redis/blob/master/CHANGELOG.md)
- [Commits](https://github.com/redis/go-redis/compare/v9.5.1...v9.7.0)

---
updated-dependencies:
- dependency-name: github.com/redis/go-redis/v9
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-04 22:31:01 +02:00
dependabot[bot]
ceb474fbda
chore(deps): bump numpy from 1.24.1 to 2.1.3 in /tools (#4063)
Bumps [numpy](https://github.com/numpy/numpy) from 1.24.1 to 2.1.3.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst)
- [Commits](https://github.com/numpy/numpy/compare/v1.24.1...v2.1.3)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-04 22:30:34 +02:00
Roman Gershman
4012ad1855
fix: prevents Dragonfly from blocking in epoll during snapshotting (#3911)
The problem - we used file write in non-direct mode when writing snapshots in epoll mode.
As a result - lots of data was cached into OS memory. But then during the rename operation,
when we rename "xxx.dfs.tmp" into "xxx.dfs", the OS flushes the file caches and the thread
is stuck in OS system call rename for a long time.

The fix - to use DIRECT mode and to avoid caching the data into OS caches at all.
Fixes #3895

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-10-12 18:26:12 +03:00
Roman Gershman
c9a2334f6d
fix: allow the healthcheck run in non-privileged containers as well (#3731)
fix: allow the healthcheck running in non-privileged containers as well

Fixes #3644 (again).

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-20 05:41:06 +00:00
Shahar Mike
1c6be62a0b
fix: Fix cluster_mgr.py (#3730)
We updated the reply of `SLOT-MIGRATION-STATUS`, so `cluster_mgr.py`
needs to be adjusted as well.
2024-09-18 11:44:15 +03:00
Roman Gershman
3cdc8fa128
chore: add a script that parses allocator tracking logs (#3687) 2024-09-10 07:26:44 +00:00
Tarun Pothulapati
65f96e3bb5
fix(docker/healthcheck): run netstat port retreival command as dfly (#3647)
* fix(docker/healthcheck): run netstat port retreival command as dfly
2024-09-04 14:34:35 +00:00
Sebastian Struß
06f6dcafcd
fix(grafana): Fix grafana dragonfly dashboard datasource (#3608)
fix: grafana dragonfly dashboard datasource
2024-08-30 17:15:51 +00:00
dependabot[bot]
e8a8d534f9
chore(deps): bump gopkg.in/yaml.v3 from 3.0.0-20210107192922-496545a6307b to 3.0.0 in /tools/replay (#3603)
chore(deps): bump gopkg.in/yaml.v3 in /tools/replay

Bumps gopkg.in/yaml.v3 from 3.0.0-20210107192922-496545a6307b to 3.0.0.

---
updated-dependencies:
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-29 16:40:37 +03:00
Roman Gershman
cec3659b51
fix: named volume permissions in docker (#3518)
Fixes #2917

The problem is described in this "working as intended" issue https://github.com/moby/moby/issues/3124
So the advised approach of using "USER dfly" directive does not really work because it requires
that the host will also define 'dfly' user with the same id. It's unrealistic expectation.

Therefore, we revert the fix done in #1775 and follow valkey approach:
https://github.com/valkey-io/valkey-container/blob/mainline/docker-entrypoint.sh#L12

1. we run the entrypoint in the container as root which later spawns the dragonfly process
2. if we run as root:
   a. we chmod files under /data to dfly.
   b. use setpriv to exec ourselves as dfly.
3. if we do not run as root we execute the docker command.

So even though the process starts as root, the server runs as dfly and only the bootstrap
part has elevated permissions is used to fix the volume access.

While we are at it, we also switched to setpriv following the change of https://github.com/valkey-io/valkey-container/pull/24/files

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-08-22 11:33:29 +03:00
Vladislav
84a697dd75
chore(traffic loger): use pipelining and print/analyze commands (#3527)
Add run, print, analyze commands to traffic logger; add support for pipelines
2024-08-20 09:32:15 +03:00
Roman Gershman
93f6773297
chore: reduce pipelining latency by reusing existing shard fibers (#3494)
* chore: reduce pipelining latency by reusing existing shard fibers

To prove the benefits, run `./dfly_bench --pipeline=50   -n 20000  --ratio 0:1  --qps=0  --key_maximum=1`
Before: the average pipelining latency was 10ms
After: the average pipelining latency is 5ms.
Avg latency: pipelined_latency_usec / total_pipelined_squashed_commands

Also, improved counting of squashed commands - to count actual squashed ones.
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-08-14 14:45:54 +03:00
Borys
48a28c3ea3
refactor: set info_replication_valkey_compatible=true (#3467)
* refactor: set info_replication_valkey_compatible=true
* test: mark test_cluster_replication_migration as skipped because it's broken
2024-08-08 21:42:58 +03:00
Shahar Mike
38fba1d398
fix: cluster_mgr.py to use CLUSTER MYID (#3444) 2024-08-05 07:29:31 +00:00
adiholden
e3eb8518fd
feat(test): Improve benchmark workflow (#3330)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-07-17 14:34:48 +03:00
Roman Gershman
374a5f529e
chore: print effective QPS of the server. (#3274)
Also refactor ReceiveFB into multiple functions.
Finally, fix the memcached command in local monitoring stack.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-07 06:26:14 +00:00
Roman Gershman
8240c7f19e
chore(monitoring): add more dashboards + memcached (#3268) 2024-07-05 07:12:13 +00:00
Shahar Mike
5b731f163c
feat(cluster_mgr): Fix migration action (#3124) 2024-06-04 13:27:42 +03:00
Shahar Mike
bcbcc5a2c6
feat(cluster_mgr): Take over command (#3120) 2024-06-04 11:39:08 +03:00
Shahar Mike
6e6c91aeaf
feat(cluster_mgr): Improvements to cluster_mgr.py (#3118)
Make sure attached node is in right mode
Enable detaching nodes
2024-06-03 19:05:17 +00:00
Roman Gershman
0394387a5f
chore: export pipeline related metrics (#3104)
* chore: export pipeline related metrics

Export in /metrics
1. Total pipeline queue length
2. Total pipeline commands
3. Total pipelined duration

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-05-30 19:10:35 +03:00
Shahar Mike
d1e3c82eaa
feat(cluster_mgr): Allow attaching replicas (#3105) 2024-05-30 15:29:58 +03:00
Vladislav
fd5ece09fb
chore: small replayer fixes (#3081) 2024-05-25 22:48:29 +03:00
Roman Gershman
8a0007d761
chore: add replication memory stats to the dashboard (#3065) 2024-05-22 08:11:54 +03:00
Jirapong Pansak
3babe99cf6
<chore>!: Update grafana panel (#3064)
update panel
2024-05-19 15:56:44 +00:00
Roman Gershman
fd74fd5b4b
chore: Export replication memory stats (#3062) 2024-05-18 22:40:14 +03:00
Borys
3dd6c4959c
feat: add defragment command (#3003)
* feat: add defragment command and improve auto defragmentation algorithm
2024-05-08 14:26:42 +03:00
adiholden
186ff31e29
Fix benchmark (#3017)
* fix(benchmark): fix lag check

Signed-off-by: adi_holden <adi@dragonflydb.io>

---------

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-05-06 18:38:13 +03:00
Zacharya
5a37c47aaf
feat(benchmark-tests): run in K8s (#2965)
Signed-off-by: adi_holden <adi@dragonflydb.io>

* feat(benchmark-tests): run in K8s

---------

Signed-off-by: adi_holden <adi@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
2024-05-03 15:12:15 +00:00
Roman Gershman
c37fe87bf1
chore: update our container distributions versions (#2983)
1. Restrict build context in our dev/weekly builder to ease development iterations.
2. Switch weekly build to debian 12-slim because it's smaller than 24.04
3. Update our prod releases to use ubuntu 22.04

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-05-01 11:34:23 +03:00
Vladislav
df598e4825
chore: Log db_index in traffic logger (#2951)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-04-24 15:13:53 +03:00
Roman Gershman
c42b3dc02f
chore: bring more clarity when replayer fails (#2933) 2024-04-19 10:49:32 +00:00
Vladislav
3e270fee53
chore(replayer): Roll back to go1.18 (#2881) 2024-04-10 16:58:51 +03:00
Shahar Mike
b8693b4805
feat(cluster): Send number of keys for incoming and outgoing migrations. (#2858)
The number of keys in an _incoming_ migration indicates how many keys
were received, while for _outgoing_ it shows the total number. Combining
the two can provide the control plane with percentage.

This slightly modified the format of the response.

Fixes #2756
2024-04-08 21:17:03 +03:00
Roman Gershman
934a8c64c9
fix: healthcheck for docker containers (#2853)
* fix: healthcheck for docker containers

Fixes #2841

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-07 10:49:00 +03:00
adiholden
6e32139ada
Benchmark runner (#2780)
* feat(github runner): add benchmark workflow

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-03-27 07:31:19 +00:00
Shahar Mike
9ba532a826
feat(server): Use mimalloc in SSL calls (#2710)
* feat(server): Use mimalloc in SSL calls

Until now, OpenSSL used `malloc()` directly. This PR overrides it to use
mimalloc.

Fixes #2709

* Add generate-tls-files.sh
2024-03-11 08:25:59 +02:00
manojks1999
0081f4de71
Chore: Fixed Docker Health Check (#2659)
* docker_healthcheck_fix

* grep_fix_for_alpine

* added environment variable for healthcheck and changed the port extraction accorfingly
2024-03-04 12:47:18 +02:00
Shahar Mike
54cb7d5cd0
feat(cluster_mgr): Add support for remote Dragonfly servers (#2671)
* WIP: `cluster_mgr.py` to work with remote targets

* Documentation

* No admin port

* Support different hostname move/migrate

* Fix migrate bug

* Fix typo in --help

* fix test

* self.update_id()
2024-02-29 11:59:54 +02:00
Shahar Mike
ebca523166
fix(cluster_mgr): Disable CPU affinity (#2632) 2024-02-20 13:43:17 +00:00
Shahar Mike
c7750b9d58
feat(cluster_mgr): Add support for migrate action (#2626)
Example usage:

```bash
# Create a 2-node cluster
./cluster_mgr.py --action=create --replicas_per_master=1 --num_master=2

# Move (no migration) all slots to first node
./cluster_mgr.py --action=move --target_port=7001 --slot_start=8192 --slot_end=16383

# Fill data - like run memtier

# Migrate all slots to 2nd node. One could measure how long this step takes.
./cluster_mgr.py --action=migrate --target_port=7002 --slot_start=0 --slot_end=16383
```
2024-02-20 13:58:13 +02:00
Roman Gershman
af23778655
fix: release pipeline (#2439)
We had a place in tools/packaging/generate_debian_package.sh that relied on the existence of build-opt,
moreover, if it did not exist the script deadlocked.

1. Added more loggings
2. Removed the loop
3. Removed unnecessary dependency in the script on the build-dir name.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-18 16:52:19 +02:00
Roman Gershman
8eda8226b2
fix: release.sh (#2432) 2024-01-17 12:51:31 +00:00
Roman Gershman
b3e0722d01
chore: fix our release pipeline (#2408)
* chore: fix our release pipeline

Also remove alpine prod.wip file that has not been used and unlikely will be for prod.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-14 17:31:59 +02:00
Vladislav
f4ea42f2f6
chore: simple traffic logger (#2378)
* feat: simple traffic logger

Controls: 
```
DEBUG TRAFFIC <base_path> | [STOP]
```
---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Co-authored-by: Roman Gershman <roman@dragonflydb.io>
2024-01-10 12:56:56 +00:00
Roman Gershman
c7db025a48
feat: expose fiber responsiveness metrics (#2125)
Should allow track caches where Dragonfly is not responsive to I/O
due to big CPU tasks. Also, update the local grafana dashboard.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-11-05 16:56:33 +02:00