1
0
Fork 0
mirror of https://github.com/dragonflydb/dragonfly.git synced 2024-12-14 11:58:02 +00:00
A modern replacement for Redis and Memcached
Find a file
Roman Gershman 114e8bec5d Fixes #41.
1. Found dangling transaction pointers that where left in the watch queue. Fix the state machine there.
2. Improved transaction code a bit, merged duplicated code into RunInShard function, got rid of RunNoop.
3. Improved BPopper::Run flow.
4. Added 'DEBUG WATCH' command. Also 'DEBUG OBJECT' now returns shard id and the lock status of the object.
2022-05-27 12:20:01 +03:00
.circleci Add circleci config.yml 2022-05-05 13:38:22 +03:00
.github A fix for the release pipeline 2022-05-26 13:44:54 +03:00
.vscode Limit table growth according to maxmemory. 2022-05-16 08:19:32 +03:00
doc Update dashtable doc 2022-05-23 19:41:55 +03:00
helio@d74a2550d5 In order to support Debian/Bullseye, we know require minimal kernel version - 5.10 2022-05-26 20:04:33 +03:00
patches Simplify serialization logic in Interpreter 2022-02-24 14:11:51 +02:00
src Fixes #41. 2022-05-27 12:20:01 +03:00
tests Fixes #41. 2022-05-27 12:20:01 +03:00
tools Upload assets to the release 2022-05-26 08:00:38 +03:00
.clang-format Add redis parser + test 2021-11-17 16:32:14 +02:00
.dockerignore Add Dockerfile for prod container. Reorganize source tree to be docker-build friendly. 2022-02-25 10:03:42 +02:00
.gitignore Add async library to the project 2021-11-16 09:59:13 +02:00
.gitmodules Rename async to helio 2021-11-18 17:14:05 +02:00
.gitorderfile Introduce SmallString as another option for CompactObject 2022-02-24 15:22:59 +02:00
CMakeLists.txt More work on tiered storage. 2022-05-05 12:05:05 +03:00
LICENSE Improve formatting of the license 2022-03-31 18:58:35 +03:00
README.md In order to support Debian/Bullseye, we know require minimal kernel version - 5.10 2022-05-26 20:04:33 +03:00
TODO.md Implement single shard use-case for rpoplpush. Some BLPOP related refactoring 2022-04-28 19:05:51 +03:00

Dragonfly

ci-tests

A novel, Redis and Memcached compatible memory store.

Benchmarks

TODO.

Running the server

Dragonfly runs on linux. It uses relatively new linux specific io-uring API for I/O, hence it requires Linux version 5.10 or later. Debian/Bullseye, Ubuntu 20.04.4 or later fit these requirements.

With docker:

docker pull ghcr.io/dragonflydb/dragonfly:latest && \
docker tag ghcr.io/dragonflydb/dragonfly:latest dragonfly

docker run --network=host --ulimit memlock=-1 --rm dragonfly

You need --ulimit memlock=-1 because some Linux distros configure the default memlock limit for containers as 64m and Dragonfly requires more.

Building from source

Dragonfly is usually built on Ubuntu 20.04 or later.

git clone --recursive https://github.com/dragonflydb/dragonfly && cd dragonfly

# to install dependencies
sudo apt install ninja-build libunwind-dev libboost-fiber-dev libssl-dev \
     autoconf-archive libtool

# Configure the build
./helio/blaze.sh -release

# Build
cd build-opt && ninja dragonfly

# Run
./dragonfly --alsologtostderr

Configuration

Dragonfly supports redis run-time arguments where applicable. For example, you can run: docker run --network=host --rm dragonfly --requirepass=foo --bind localhost.

dragonfly currently supports the following Redis arguments:

  • port
  • bind
  • requirepass
  • maxmemory
  • dir - by default, dragonfly docker uses /data folder for snapshotting. You can use -v docker option to map it to your host folder.
  • dbfilename

In addition, it has Dragonfly specific arguments options:

  • memcache_port - to enable memcached compatible API on this port. Disabled by default.
  • keys_output_limit - maximum number of returned keys in keys command. Default is 8192. keys is a dangerous command. we truncate its result to avoid blowup in memory when fetching too many keys.
  • dbnum - maximum number of supported databases for select.
  • cache_mode - see Cache section below.

for more options like logs management or tls support, run dragonfly --help.

Roadmap and status

Currently Dragonfly supports ~130 Redis commands and all memcache commands besides cas. We are almost on part with Redis 2.8 API. Our first milestone will be to stabilize basic functionality and reach API parity with Redis 2.8 and Memcached APIs. If you see that a command you need, is not implemented yet, please open an issue.

The next milestone will be implementing H/A with redis -> dragonfly and dragonfly<->dragonfly replication.

For dragonfly-native replication we are planning to design a distributed log format that will support order of magnitude higher speeds when replicating.

After replication and failover feature we will continue with other Redis commands from API 3,4,5 except for cluster mode functionality.

Initial release

API 1.0

  • String family
    • SET
    • SETNX
    • GET
    • DECR
    • INCR
    • DECRBY
    • GETSET
    • INCRBY
    • MGET
    • MSET
    • MSETNX
    • SUBSTR
  • Generic family
    • DEL
    • ECHO
    • EXISTS
    • EXPIRE
    • EXPIREAT
    • KEYS
    • PING
    • RENAME
    • RENAMENX
    • SELECT
    • TTL
    • TYPE
    • SORT
  • Server Family
    • AUTH
    • QUIT
    • DBSIZE
    • BGSAVE
    • SAVE
    • DEBUG
    • EXEC
    • FLUSHALL
    • FLUSHDB
    • INFO
    • MULTI
    • SHUTDOWN
    • LASTSAVE
    • SLAVEOF/REPLICAOF
    • SYNC
  • Set Family
    • SADD
    • SCARD
    • SDIFF
    • SDIFFSTORE
    • SINTER
    • SINTERSTORE
    • SISMEMBER
    • SMOVE
    • SPOP
    • SRANDMEMBER
    • SREM
    • SMEMBERS
    • SUNION
    • SUNIONSTORE
  • List Family
    • LINDEX
    • LLEN
    • LPOP
    • LPUSH
    • LRANGE
    • LREM
    • LSET
    • LTRIM
    • RPOP
    • RPOPLPUSH
    • RPUSH
  • SortedSet Family
    • ZADD
    • ZCARD
    • ZINCRBY
    • ZRANGE
    • ZRANGEBYSCORE
    • ZREM
    • ZREMRANGEBYSCORE
    • ZREVRANGE
    • ZSCORE
  • Not sure whether these are required for the initial release.
    • BGREWRITEAOF
    • MONITOR
    • RANDOMKEY
    • MOVE

API 2.0

  • List Family
    • BLPOP
    • BRPOP
    • BRPOPLPUSH
    • LINSERT
    • LPUSHX
    • RPUSHX
  • String Family
    • SETEX
    • APPEND
    • PREPEND (dragonfly specific)
    • BITCOUNT
    • BITFIELD
    • BITOP
    • BITPOS
    • GETBIT
    • GETRANGE
    • INCRBYFLOAT
    • PSETEX
    • SETBIT
    • SETRANGE
    • STRLEN
  • HashSet Family
    • HSET
    • HMSET
    • HDEL
    • HEXISTS
    • HGET
    • HMGET
    • HLEN
    • HINCRBY
    • HINCRBYFLOAT
    • HGETALL
    • HKEYS
    • HSETNX
    • HVALS
    • HSCAN
  • PubSub family
    • PUBLISH
    • PUBSUB
    • PUBSUB CHANNELS
    • SUBSCRIBE
    • UNSUBSCRIBE
    • PSUBSCRIBE
    • PUNSUBSCRIBE
  • Server Family
    • WATCH
    • UNWATCH
    • DISCARD
    • CLIENT LIST/SETNAME
    • CLIENT KILL/UNPAUSE/PAUSE/GETNAME/REPLY/TRACKINGINFO
    • COMMAND
    • COMMAND COUNT
    • COMMAND GETKEYS/INFO
    • CONFIG GET/REWRITE/SET/RESETSTAT
    • MIGRATE
    • ROLE
    • SLOWLOG
    • PSYNC
    • TIME
    • LATENCY...
  • Generic Family
    • SCAN
    • PEXPIREAT
    • PEXPIRE
    • DUMP
    • EVAL
    • EVALSHA
    • OBJECT
    • PERSIST
    • PTTL
    • RESTORE
    • SCRIPT LOAD/EXISTS
    • SCRIPT DEBUG/KILL/FLUSH
  • Set Family
    • SSCAN
  • Sorted Set Family
    • ZCOUNT
    • ZINTERSTORE
    • ZLEXCOUNT
    • ZRANGEBYLEX
    • ZRANK
    • ZREMRANGEBYLEX
    • ZREMRANGEBYRANK
    • ZREVRANGEBYSCORE
    • ZREVRANK
    • ZUNIONSTORE
    • ZSCAN
  • HYPERLOGLOG Family
    • PFADD
    • PFCOUNT
    • PFMERGE

Memchache API

  • set
  • get
  • replace
  • add
  • stats (partial)
  • append
  • prepend
  • delete
  • flush_all
  • incr
  • decr
  • version
  • quit

Random commands we implemented as decorators along the way:

  • ROLE (2.8) decorator as master.
  • UNLINK (4.0) decorator for DEL command
  • BGSAVE (decorator for save)
  • FUNCTION FLUSH (does nothing)

Milestone - H/A

Implement leader/follower replication (PSYNC/REPLICAOF/...).

Milestone - "Maturity"

APIs 3,4,5 without cluster support, without modules and without memory introspection commands. Also without geo commands and without support for keyspace notifications, without streams. Probably design config support. Overall - few dozens commands... Probably implement cluster-API decorators to allow cluster-configured clients to connect to a single instance.

Next milestones will be determined along the way.

Design decisions

Novel cache design

Dragonfly has a single unified adaptive caching algorithm that is very simple and memory efficient. You can enable caching mode by passing --cache_mode=true flag. Once this mode is on, Dragonfly will evict items least likely to be stumbled upon in the future but only when it is near maxmemory limit.

Expiration deadlines with relative accuracy

Expiration ranges are limited to ~4 years. Moreover, expiration deadlines with millisecond precision (PEXPIRE/PSETEX etc) will be rounded to closest second for deadlines greater than 134217727ms (approximately 37 hours). Such rounding has less than 0.001% error which I hope is acceptable for large ranges. If it breaks your use-cases - talk to me or open an issue and explain your case.

For more detailed differences between this and Redis implementations see here.

Native Http console and Prometheus compatible metrics

By default Dragonfly allows http access via its main TCP port (6379). That's right, you can connect to Dragonfly via Redis protocol and via HTTP protocol - the server recognizes the protocol automatically during the connection initiation. Go ahead and try it with your browser. Right now it does not have much info but in the future we are planning to add there useful debugging and management info. If you go to :6379/metrics url you will see some prometheus compatible metrics.

Important! Http console is meant to be accessed within a safe network. If you expose Dragonfly's TCP port externally, it is advised to disable the console with --http_admin_console=false or --nohttp_admin_console.

Background

Dragonfly started as an experiment to see how an in-memory datastore could look like if it was designed in 2022. Based on lessons learned from our experience as users of memory stores and as engineers who worked for cloud companies, we knew that we need to preserve two key properties for Dragonfly: a) to provide atomicity guarantees for all its operations, and b) to guarantee low, sub-millisecond latency over very high throughput.

Our first challenge was how to fully utilize CPU, memory, and i/o resources using servers that are available today in public clouds. To solve this, we used shared-nothing architecture, which allows us to partition the keyspace of the memory store between threads, so that each thread would manage its own slice of dictionary data. We call these slices - shards. The library that powers thread and I/O management for shared-nothing architecture is open-sourced here.

To provide atomicity guarantees for multi-key operations, we used the advancements from recent academic research. We chose the paper "VLL: a lock manager redesign for main memory database systems” to develop the transactional framework for Dragonfly. The choice of shared-nothing architecture and VLL allowed us to compose atomic multi-key operations without using mutexes or spinlocks. This was a major milestone for our PoC and its performance stood out from other commercial and open-source solutions.

Our second challenge was to engineer more efficient data structures for the new store. To achieve this goal, we based our core hashtable structure on paper "Dash: Scalable Hashing on Persistent Memory". The paper itself is centered around persistent memory domain and is not directly related to main-memory stores. Nevertheless, its very much applicable for our problem. It suggested a hashtable design that allowed us to maintain two special properties that are present in the Redis dictionary: a) its incremental hashing ability during datastore growth b) its ability to traverse the dictionary under changes using a stateless scan operation. Besides these 2 properties, Dash is much more efficient in CPU and memory. By leveraging Dash's design, we were able to innovate further with the following features:

  • Efficient record expiry for TTL records.
  • A novel cache eviction algorithm that achieves higher hit rates than other caching strategies like LRU and LFU with zero memory overhead.
  • A novel fork-less snapshotting algorithm.

After we built the foundation for Dragonfly and we were happy with its performance, we went on to implement the Redis and Memcached functionality. By now, we have implemented ~130 Redis commands (equivalent to v2.8) and 13 Memcached commands.

And finally,
Our mission is to build a well-designed, ultra-fast, cost-efficient in-memory datastore for cloud workloads that takes advantage of the latest hardware advancements. We intend to address the pain points of current solutions while preserving their product APIs and propositions.

P.S. other engineers share a similar sentiment about what makes a good memory store. See, for example, here and here blog posts from Twitter's memcache team, or this post from authors of keydb.