mirror of
https://github.com/dragonflydb/dragonfly.git
synced 2024-12-14 11:58:02 +00:00
Update dashtable doc
This commit is contained in:
parent
8bed96d20b
commit
5ad6352ad7
3 changed files with 70 additions and 28 deletions
1
doc/bgsave_memusage.svg
Normal file
1
doc/bgsave_memusage.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 78 KiB |
|
@ -94,23 +94,21 @@ Please note that with all efficiency of Dashtable, it can not decrease drastical
|
|||
overall memory usage. Its primary goal is to reduce waste around dictionary management.
|
||||
|
||||
Having said that, by reducing metadata waste we could insert dragonfly-specific attributes
|
||||
into a table's metadata in order to implement other intelligent algorithms like forkless save. This is where Dragonfly's most impactfuls savings happen.
|
||||
into a table's metadata in order to implement other intelligent algorithms like forkless save. This is where some the Dragonfly's disrupting qualities [can be seen](#forkless-save).
|
||||
|
||||
|
||||
## Benchmarks
|
||||
|
||||
There are many other improvements in dragonfly that save memory besides DT. I will not be
|
||||
able to cover them all here. The results below show the final result as of May 2022.
|
||||
|
||||
To compare RD vs DT I use an internal debugging command "debug populate" that quickly fills
|
||||
both datastores with small data. It just saves time and gives more consistent results compared to memtier_benchmark.
|
||||
It also shows the raw speed at which each dictionary gets filled without intermediary factors
|
||||
like networking, parsing etc.
|
||||
### Populate single-threaded
|
||||
To compare RD vs DT I often use an internal debugging command "debug populate" that quickly fills both datastores with data. It just saves time and gives more consistent results compared to memtier_benchmark.
|
||||
It also shows the raw speed at which each dictionary gets filled without intermediary factors like networking, parsing etc.
|
||||
I deliberately fill datasets with a small data to show how overhead of metadata differs between two data structures.
|
||||
|
||||
I run "debug populate 20000000" (20M) on both engines on my home machine "AMD Ryzen 5 3400G with 8 cores".
|
||||
|
||||
### Single-threaded scenario
|
||||
|
||||
| | Dragonfly | Redis 6 |
|
||||
|-------------|-----------|---------|
|
||||
| Time | 10.8s | 16.0s |
|
||||
|
@ -120,49 +118,53 @@ When looking at Redis6 "info memory" stats, you can see that `used_memory_overhe
|
|||
to `1.0GB`. That means that out of 1.73GB bytes allocated, a whooping 1.0GB is used for
|
||||
the metadata. For small data use-cases the cost of metadata in Redis is larger than the data itself.
|
||||
|
||||
### Multi-threaded scenario
|
||||
### Populate multi-threaded
|
||||
|
||||
Now I run Dragonfly on all 8 cores. Redis has the same results, of course.
|
||||
|
||||
Now I run Dragonfly on all 8 cores.
|
||||
| | Dragonfly | Redis 6 |
|
||||
|-------------|-----------|---------|
|
||||
| Time | 2.43s | 16.0s |
|
||||
| Memory used | 896MB | 1.73G |
|
||||
|
||||
Due to shared-nothing architecture, dragonfly maintains a dashtable per thread with its own slice of data.
|
||||
Each thread fills 1/8th of 20M range it owns - and it much faster, almost 8 times faster.
|
||||
You can see that the total usage is even smaller, because now we maintain smaller tables in each
|
||||
Due to shared-nothing architecture, Dragonfly maintains a dashtable per thread with its own slice of data. Each thread fills 1/8th of 20M range it owns - and it much faster, almost 8 times faster.You can see that the total usage is even smaller, because now we maintain
|
||||
smaller tables in each
|
||||
thread (it's not always the case though - we could get slightly worse memory usage than with
|
||||
single-threaded case ,depends where we stand compared to hash table utilization).
|
||||
|
||||
### Forkless Save
|
||||
We run `memtier_benchmark` for this experiment. The loadtest has been sending write requests
|
||||
according to the command below.
|
||||
|
||||
```bash
|
||||
memtier_benchmark -p 6380 --ratio 1:0 -n 1000000 --threads=2 -c 20 --distinct-client-seed --key-prefix="key:" --hide-histogram --key-maximum=10000000 -d 256
|
||||
```
|
||||
This example shows how much memory Dragonfly uses during BGSAVE under load compared to Redis. Btw, BGSAVE and SAVE in Dragonfly is the same procedure because it's implemented using fully asynchronous algorithm that maintains point-in-time snapshot guarantees.
|
||||
|
||||
TODO
|
||||
This test consists of 3 steps:
|
||||
1. Execute `debug populate 5000000 key 1024` command on both servers to quickly fill them up
|
||||
with ~5GB of data.
|
||||
2. Run `memtier_benchmark --ratio 1:0 -n 600000 --threads=2 -c 20 --distinct-client-seed --key-prefix="key:" --hide-histogram --key-maximum=5000000 -d 1024` command in order to send constant update traffic. This traffic should not affect substantially the memory usage of both servers.
|
||||
3. Finally, run `bgsave` on both servers while measuring their memory.
|
||||
|
||||
It's very hard, technically to measure exact memory usage of Redis during BGSAVE because it creates a child process that shares its parent memory in-part. We chose `cgroupsv2` as a tool to measure the memory. We put each server into a separate cgroup and we sampled `memory.current` attribute for each cgroup. Since a forked Redis process inherits the cgroup of the parent, we get an accurate estimation of their total memory usage. Although we did not need this for Dragonfly we applied the same approach for consistency.
|
||||
|
||||
![BGSAVE](./bgsave_memusage.svg)
|
||||
|
||||
As you can see on the graph, Redis uses 50% more memory even before BGSAVE starts. Around second 14, BGSAVE kicks off on both servers. Visually you can not see this event on Dragonfly graph, but it's seen very well on Redis graph. It took just few seconds for Dragonfly to finish its snapshot (again, not possible to see) and around second 20 Dragonfly is already behind BGSAVE. You can see a distinguishinable cliff at second 39
|
||||
where Redis finishes its snapshot, reaching almost x3 times more memory usage at peak.
|
||||
|
||||
### Expiry of items during writes
|
||||
Efficient Expiry is very important for many scenarios. See, for example,
|
||||
[Pelikan paper'21](https://twitter.github.io/pelikan/2021/segcache.html). Twitter team says
|
||||
that their their memory footprint could be reduced by as much as by 60% by employing better expiry
|
||||
methodology. The authors of the post above show prons and cons of expiration methods in the table below:
|
||||
that their their memory footprint could be reduced by as much as by 60% by employing better expiry methodology. The authors of the post above show prons and cons of expiration methods in the table below:
|
||||
|
||||
<img src="https://twitter.github.io/pelikan/assets/img/segcache/expiration.svg" width="400">
|
||||
|
||||
They argue that proactive expiration is very important for timely deletion of expired items.
|
||||
Dragonfly, employs its own intelligent garbage collection procedure. By leveraging DashTable
|
||||
compartmentalized structure it can actually employ passive expiry with low CPU overhead.
|
||||
Its passive procedure is complimented with proactive gradual scanning of the table in background.
|
||||
compartmentalized structure it can actually employ a very efficient passive expiry algorithm with low CPU overhead. Our passive procedure is complimented with proactive gradual scanning of the table in background.
|
||||
|
||||
The procedure is a follows:
|
||||
A dashtable grows when its segment becomes full during the insertion and needs to be split.
|
||||
This is a convenient point to perform garbage collection, but only for that segment.
|
||||
We can scan its buckets for the expired items. If we delete some of them, we may avoid growing
|
||||
the table altogether! The cost of scanning the segment before potential split is nor more the
|
||||
split itself so can be described as `O(1)`.
|
||||
We scan its buckets for the expired items. If we delete some of them, we may avoid growing the table altogether! The cost of scanning the segment before potential split is no more the
|
||||
split itself so can be estimated as `O(1)`.
|
||||
|
||||
We use `memtier_benchmark` for the experiment to demonstrate Dragonfly vs Redis expiry efficiency.
|
||||
We run locally the following command:
|
||||
|
@ -172,7 +174,7 @@ memtier_benchmark --ratio 1:0 -n 600000 --threads=2 -c 20 --distinct-client-seed
|
|||
--key-prefix="key:" --hide-histogram --expiry-range=30-30 --key-maximum=100000000 -d 256
|
||||
```
|
||||
|
||||
We load larger values this time (256 bytes) to reduce the impact of metadata savings
|
||||
We load larger values (256 bytes) to reduce the impact of metadata savings
|
||||
of Dragonfly.
|
||||
|
||||
| | Dragonfly | Redis 6 |
|
||||
|
@ -180,8 +182,7 @@ of Dragonfly.
|
|||
| Memory peak usage | 1.45GB | 1.95GB |
|
||||
| Avg SET qps | 131K | 100K |
|
||||
|
||||
Please note that Redis could sustain 30% less qps. That means that the optimal working sets for
|
||||
Dragonfly and Redis are different - the former needed to host at least `20s*131k` items
|
||||
Please note that Redis could sustain 30% less qps. That means that the optimal working sets for Dragonfly and Redis are different - the former needed to host at least `20s*131k` items
|
||||
at any point of time and the latter only needed to keep `20s*100K` items.
|
||||
So for `30%` bigger working set Dragonfly needed `25%` less memory at peak.
|
||||
|
||||
|
|
40
doc/memory_bgsave.tsv
Normal file
40
doc/memory_bgsave.tsv
Normal file
|
@ -0,0 +1,40 @@
|
|||
Tiime Dragonfly Redis
|
||||
4 4738531328 6819917824
|
||||
5 4738637824 6819917824
|
||||
6 4738658304 6819913728
|
||||
7 4738777088 6820589568
|
||||
8 4738781184 6820638720
|
||||
9 4738768896 6820769792
|
||||
10 4738494464 6820777984
|
||||
11 4738756608 6820683776
|
||||
12 4740325376 6820687872
|
||||
13 4740243456 6820691968
|
||||
14 4740194304 6820687872
|
||||
15 4740194304 7429746688
|
||||
16 4740734976 7942115328
|
||||
17 4740370432 8400957440
|
||||
18 4740366336 8863305728
|
||||
19 4740390912 9302515712
|
||||
20 4740399104 9697935360
|
||||
21 4740423680 10074103808
|
||||
22 4748312576 10362601472
|
||||
23 4750438400 10649939968
|
||||
24 4750315520 10926985216
|
||||
25 4750426112 11195555840
|
||||
26 4750180352 11444666368
|
||||
27 4750417920 11665764352
|
||||
28 4750131200 11872944128
|
||||
29 4750233600 12060946432
|
||||
30 4750475264 12232212480
|
||||
31 12379299840
|
||||
32 12521598976
|
||||
33 12647915520
|
||||
34 12756508672
|
||||
35 12848570368
|
||||
36 12944240640
|
||||
37 13025046528
|
||||
38 13105799168
|
||||
39 13181427712
|
||||
40 8000053248
|
||||
41 7048486912
|
||||
42 7048507392
|
|
Loading…
Reference in a new issue