* fix(server): client pause fix on pipeline squash
allow squashing commands on pause
move await on client pause inside InvokeCommand - this way all flows of command invoke will read pause state
Signed-off-by: adi_holden <adi@dragonflydb.io>
* fix(stats): Do not crash upon issuing `mem stats`
The reason for the crash is that we can't use a mutex while iterating
connections. It uses a non-Fiber `Await()`, and it also has a fiber
atomic guard.
Instead use the common trick of allocating per-thread data and aggregate
afterward.
* Use pool size
This PR introduces a test case for TLS with `ca_dir`. First, we
did not have any tests for this case. Second, using `ca_dir` requires
to call `c_rehash` on the directory before it is loaded by DF. We
did not have this use case anywhere and therefore we thought there was
a bug when we used `ca_dir` only to find out that we need to call
`c_rehash` on the directory before we load the certificates. Now,
both a test and a use case are properly documented
* add missing test for ca_dir
* use rehash to properly show how to load ca directories instead of
files
Then use the right version (hopefully) in the right places.
Specifically, this fixes a serialization bug, where we could send
malformed responses when using `UpperBoundSize()` to write array length.
Should allow track caches where Dragonfly is not responsive to I/O
due to big CPU tasks. Also, update the local grafana dashboard.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This caused a UBSan warning to be printed in test `DflyEngineTest.Bug207`
```
/usr/include/c++/11/bits/stl_vector.h:1046:34: runtime error: reference binding to null pointer of type 'struct value_type'
```
The dashboard used `dragonfly_up` metric to boostrap itself
but this metric does not exist anymore. I replaced it with `dragonfly_version`
In addition, the exported format changed slightly because I used the
recent grafana version to export.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Regression test sometimes fails because for a short period of time after `wait_available_async()` returns, the result of `ROLE` could still be different from `stable_sync`
[Failure example](https://github.com/dragonflydb/dragonfly/actions/runs/6726461923/job/18282759612#step:6:1863)
We change our state from `LOADING` to `ACTIVE` [here](d08d7f13b4/src/server/replica.cc (L426)), but then we change the sync state 2 times:
1. `!R_SYNCING` [here](d08d7f13b4/src/server/replica.cc (L427C28-L427C37))
2. And only later to `R_SYNC_OK` (meaning `stable_sync`) [here](d08d7f13b4/src/server/replica.cc (L221))
This is easy to reproduce by adding a sleep right after the set of state to `ACTIVE`, either before or after the flipping of `R_SYNCING` (with different returned states).
BTW without that added sleep I was not able to reproduce, having tried 1000s of times in various configurations.
We could change the order of things such that we first change `state_mask_` and only then switch state from `LOADING` to `ACTIVE` (which is probably the right thing to do), but that would require a subtle refactor, as we change these in a couple of places.
But we should keep in mind that this has no effect on users. So a simple sleep on the test side should fix this fairly well.