mirror of
https://github.com/zhaofengli/attic.git
synced 2024-12-14 11:57:30 +00:00
book/faqs: Talk about compression
This commit is contained in:
parent
0c1f362a62
commit
ee16664046
1 changed files with 33 additions and 0 deletions
|
@ -1,5 +1,7 @@
|
||||||
# FAQs
|
# FAQs
|
||||||
|
|
||||||
|
<!-- TODO: Write more about design decisions in a separate section -->
|
||||||
|
|
||||||
## Does it replace [Cachix](https://www.cachix.org)?
|
## Does it replace [Cachix](https://www.cachix.org)?
|
||||||
|
|
||||||
No, it does not.
|
No, it does not.
|
||||||
|
@ -29,6 +31,37 @@ Authentication is done via signed JWTs containing the allowed permissions.
|
||||||
Each instance of `atticd --mode api-server` is stateless.
|
Each instance of `atticd --mode api-server` is stateless.
|
||||||
This design may be revisited later, with option for a more stateful method of authentication.
|
This design may be revisited later, with option for a more stateful method of authentication.
|
||||||
|
|
||||||
|
## How is compression handled?
|
||||||
|
|
||||||
|
Uploaded NARs are compressed on the server before being streamed to the storage backend.
|
||||||
|
We use the hash of the _uncompressed NAR_ to perform global deduplication.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌───────────────────────────────────►NAR Hash
|
||||||
|
│
|
||||||
|
│
|
||||||
|
├───────────────────────────────────►NAR Size
|
||||||
|
│
|
||||||
|
┌─────┴────┐ ┌──────────┐ ┌───────────┐
|
||||||
|
NAR Stream──►│NAR Hasher├─►│Compressor├─►│File Hasher├─►File Stream─►S3
|
||||||
|
└──────────┘ └──────────┘ └─────┬─────┘
|
||||||
|
│
|
||||||
|
├───────►File Hash
|
||||||
|
│
|
||||||
|
│
|
||||||
|
└───────►File Size
|
||||||
|
```
|
||||||
|
|
||||||
|
At first glance, performing compression on the client and deduplicating the result may sound appealing, but has problems:
|
||||||
|
|
||||||
|
1. Different compression algorithms and levels naturally lead to different results which can't be deduplicated
|
||||||
|
2. Even with the same compression algorithm, the results are often non-deterministic (number of compression threads, library version, etc.)
|
||||||
|
|
||||||
|
When we do the compression on the server and use the hashes of uncompressed NARs for lookups, the problem of non-determinism is no longer a problem since we only compress once.
|
||||||
|
|
||||||
|
On the other hand, performing compression on the server leads to additional CPU usage, increasing compute costs and the need to scale.
|
||||||
|
Such design decisions are to be revisited later.
|
||||||
|
|
||||||
## On what granularity is deduplication done?
|
## On what granularity is deduplication done?
|
||||||
|
|
||||||
Currently, global deduplication is done on the level of NAR files.
|
Currently, global deduplication is done on the level of NAR files.
|
||||||
|
|
Loading…
Reference in a new issue