Implements RCU (read-copy-update) for updating the centralized channel store.
Contrary to old mechanism of sharding subscriber info across shards, a centralized store allows avoiding a hop for fetching subscribers. In general, it only slightly improves the latency, but in case of heavy traffic on one channel it allows "spreading" the load, as the single shard no longer is a bottleneck, thus increasing throughput by multiple times.
See channel_store header for implementation details
fix: improve connection affinity heuristic.
1. fix potential crash bug when traversing connections with client list
2. fetch cpu/napi id information when handling a new connection.
3. add thread id (tid) and irqmatch fields to client list command.
4. Implement a heuristic under flag that puts a connection on the
CPU id that handles the IRQ queue that handles its socket.
However, if a too wide gap introduced between number of connections on
IRQ threads and other threads we fallback to other threads.
In my tests I saw 15-20% CPU reduction when this heuristic is enabled.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* Fix crash in ZPOPMIN
Crash was due to an attempt to access nullptr[-1], which is bad :)
* Add test to repro crash.
There's some leftover debugging statements, they're somewhat useful so I
kept them as the bug is not yet fixed.
* Copy patch by romange to solve the crash
Also re-enable (uncomment) the test in utility.py.
Signed-off-by: chakaz <chakaz@chakaz>
---------
Signed-off-by: chakaz <chakaz@chakaz>
Signed-off-by: Chaka <chakaz@users.noreply.github.com>
Co-authored-by: chakaz <chakaz@chakaz>
Alpine images don't have bash installed by default, so we need to use
`/bin/sh` instead. This follows the *same existing convention that
we follow in the `entrypoint.sh` script*.
Both ubuntu and alpine images have been tested (i.e healthchecks to
pass) to work with this change.
TopKeys uses a custom implementation of HeavyKeeper to track top (hot)
keys usage for debugging purposes.
This commit also integrates TopKeys (default off) into DbTable and counts
usage of (present) key lookups.
Signed-off-by: chakaz <chakaz@chakaz>
Reduce reliance on modern vectorized architectures pending proper configuration flags.
Should fix the daily build pipeline.
Fixes#732.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The deadlock happenned during the brpop flow where we access
shard_data.local_data from both coordinator and shard threads.
Originally, shard_data.local_data was not designed for concurrent access,
and I used ARMED bit to deduplicate callback runs for each shard.
The problem is that within BRPOP flow, the
ExecuteAsync would apply "=| ARMED" and in parallel NotifySuspended would apply
" |= AWAKED" in the shard thread, and both R/M/W operations would corrupt each other.
Therefore, I separated now completely shard-local local_data mask and is_armed boolean.
Moreover, since now we use atomics for is_armed, I increased PerShardData size to 64 bytes
to avoid false cache sharding betweenn PerShardData objects.
Fixes#945
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Added a test that was breaking earlier.
2. Made sure that multiple waked brpop transaction would not
snatch items from one another.
3. Fixed watched-queues clean-up logic inside blocking_controller that caused deadlocks.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. pytest extensions and fixes - allows running them
with the existing local server by providing its port (--existing <port>).
2. Extend "DEBUG WATCHED" command to provide more information about watched state.
3. Improve debug/vlog printings around the code.
This noisy PR is a preparation before BRPOP fix that will follow later.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This commit adds a scheduled job that runs 8 AM Israel time every day,
with common build configuration flags so that we can be sure
that building from source for known configurations is possible.
Previously we set batch mode when dispatch queue is not empty
but dispatch queue could contain other async messages related to pubsub or monitor.
Now we enable batching only if there are more pipeline commands in the queue.
In addition, fix the issue of unlimited aggregation of batching buffer.
Fixes#935.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The issue happens when SendMsgVecAsync is called with PubMessage that has
string_view objects referencing objects in stack. We replace string_view
with either string or shared_ptr<string>
Signed-off-by: Roman Gershman <roman@dragonflydb.io>