thoughts/data/sdn.md

286 lines
12 KiB
Markdown
Raw Normal View History

2024-08-05 21:33:47 +02:00
## Key Takeaways
* Matrix Dendrite can be run behind Tailscale Funnel to shape the
network logically. This way we can avoid opening our local
infrastructure directly to the Internet.
* Using Traefik we can route traffic for security monitoring
and to apply policies, also adding centralised logging such as
with LimaCharlie Adapters.
* Bandwidth limitations in Tailscale Funnel and general stability
have been a breeze so far for a small Dendrite-based Matrix server.
## Background
Matrix has been quite a ride for me since I first deployed a Synapse server
in 2016. I've trained others in the cyber security community for it
and also decomissioned Matrix for XMPP, not only once, but twice!
In the end, I've decided that Matrix is the currently best common
denominator between safety, security and usability. Slack has
been my goto over the past couple of years, but having Matrix as an
option and for select conversation has been good for the inherent
vulnerability and complexity of the Slack ecosystem when used in a
cyber security context.
While the Matrix ecosystem are driven forward by the Element client
and Dendrite Homeserver in particular, other parts of infrastructure
components evolves as well. Due to this I recently sat down to migrate my
Synapse server on OpenBSD to Docker (on Debian for now).
## Architecture Using Tailscale Funnel+Proxy, Docker, Traefik and Dendrite
When redesigning my setup, I wanted to put emphasis on a sane flow
that is re-usable across Docker setups. I also wanted something with easy
visibility and traceability. Since Tailscale recently announced public
availability of Tailscale Funnel I decided to move the entrypoint of my
infrastructure to a Funnel.
Summarized I wanted to use the following components:
1. Docker: Recent pain with upgrading Postgres on OpenBSD,
convinced me to take the leap to Docker on Debian for stability
and compatibility
2. Dendrite: The next Golang-based Homeserver for Matrix, supported
by a PostgreSQL database.
3. Traefik: Reverse proxy for controlling and auditing access to the
Docker network in addition to the Tailscale ACL which if focused on
transport over application security.
4. Tailscale: Software-defined infrastructure building on Wireguard.
[^tailscale]: Tailscale is a coordination layer built on top of the
Wireguard VPN protocol. It makes it easy to control on the application
layer which accounts, devices etc should have access to what resources
and has a great advantage for inventory that it is opt-in.
Organizing these components into the network flow chart shown below, we can see
that the only external point of contact for a Matrix device or federated homerserver will be
`matrix.{{tailnet-id}}.ts.net`. This is an ingress point that Tailscale
has made available through their so-called Funnel. For all practical purposes it is a
tunnel that connects the Taiscale Docker container with the Internet
for non-Tailscale devices and for Matrix federation.
We will open for port 443 to allow for client traffic, and port 8443 that allows
for federation with homeservers such as `matrix.org`. These both reads from static
hosting at Firebase to figure out where to go. The .well-known files is step 1 for
both clients and servers when you'd like to connect with me at `@tommy:rodl.no`.
```
┌────────────────┐ ┌───────────┐
│ Matrix Client │ ┌─────┐ │ Tailscale │- Proxy-mode to localhost
│ Device │──443────┬────────▶│ │────────────────▶│ container │- socat tcp/tls to Traefik
└────────────────┘ │ └─────┘ └───────────┘
│ │ matrix.{{tailnet-id}}.ts.net │
│ Tailscale ingress node │
│ │ ▼
▼ │ ┌───────────┐
┌──────────────────────────┐ │ │ Traefik │- Tailscale 3.0 mode certs
│rodl.no/.well-known/client│ │ │ container │- Terminate TLS
├──────────────────────────┤ │ │ │- Route matrix.*.ts.net to Dendrite
│rodl.no/.well-known/server│ │ └───────────┘
└──────────────────────────┘ │ │
▲ │ │
│ │ ▼
│ ┌───────────┐
│ │ │ Dendrite │
┌────────────────┐ │ │ container │
│ Other Matrix │ │ ├───────────┤
│ Homeserver │──8443───┘ │ Postgres │
└────────────────┘ │ container │
└───────────┘
```
The Tailscale container is a lightly modified Tailscale image as shown in `Dockerfile.tailscale`
below. The Dockerfile extends the proxying capabilities of Tailscale (Tailscale only supports
localhost) to forward TCP to another container.
```docker
FROM tailscale/tailscale:stable
RUN apk add socat
COPY config/tailscale/startup.sh /tmp/startup.sh c
```
`startup.sh` contains the following. The serve commands enables local proxying,
while the funnel opens up port 443 and 8443 to the world, and finally socat forwards
TCP to the static IP of the Traefik container.
```sh
#!/usr/bin/env sh
tailscaled &
sleep 5
tailscale serve tcp:443 tcp://127.0.0.1:8000
tailscale serve tcp:8443 tcp://127.0.0.1:8001
tailscale funnel 443 on
tailscale funnel 8443 on
socat -v tcp-listen:8000,fork,reuseaddr tcp:172.25.0.10:4443 &
socat -v tcp-listen:8001,fork,reuseaddr tcp:172.25.0.10:8443
```
Traefik is the receiver of the TLS traffic to `matrix.{{tailnet-id}}.ts.net`. When Traefik
sees traffic to this domain on port 8443 and 443 it routes it to the Dendrite container.
This is also a central vantage point where one can enforce policies and do security monitoring.
It is also re-usable across networks with various services and can easily embed e.g. a
LimaCharlie adapter for central logging.
[^limacharlie]: [LimaCharlie](https://limacharlie.io) is a transparent security infrastructure
vendor focusing on inputs and outputs, in combination with an excellent detection and transform
engine. LimaCharlie adapters are described here:
https://doc.limacharlie.io/docs/documentation/73a613e8e43ed-lima-charlie-adapter
Finally we communicate with Dendrite, but the external devices and homeservers will only
see `matrix.{{tailnet-id}}.ts.net`.
## Now in Docker Compose
An example of how to do this with docker-compose is shown below with the following services
structure:
* tailscale
* reverse-proxy
* postgres
* dendrite
```yaml
version: '3'
networks:
web:
name: web
external: true
matrix-internal:
name: matrix-internal
external: true
services:
tailscale:
hostname: matrix
#image: tailscale/tailscale:stable
build:
context: .
dockerfile: Dockerfile.tailscale
command: /tmp/startup.sh
restart: unless-stopped
env_file:
- ./config/tailscale/.tailscale_env
network_mode: host
privileged: true
cap_add: # Required for tailscale to work
- net_admin
- sys_module
volumes:
- /dev/net/tun:/dev/net/tun
- ./data/tailscale/lib:/var/lib/tailscale
- ./data/tailscale/run:/var/run/tailscale
reverse-proxy:
depends_on:
- tailscale
# The official v2 Traefik docker image
image: traefik:v3.0.0-beta2
# Enables the web UI and tells Traefik to listen to docker
command:
#- --log.level=DEBUG
- --api.insecure=true
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --providers.docker.network=matrix-internal
- --entrypoints.client.address=:4443
- --entrypoints.federation.address=:8443
- --entrypoints.web.address=:80
- --certificatesresolvers.tailscaleResolver.tailscale=true
ports:
- "80:80"
- "4443:4443"
- "443:443/udp"
- "8443:8443"
volumes:
# make the Tailscale socket available to Traefik
- ./data/tailscale/run/tailscaled.sock:/var/run/tailscale/tailscaled.sock
# Add Docker as a mounted volume, so that Traefik can read the labels of other services
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
web:
matrix-internal:
ipv4_address: 172.25.0.10
postgres:
hostname: postgres
image: postgres:15
restart: always
volumes:
- ./config/postgres/create_db.sh:/docker-entrypoint-initdb.d/20-create_db.sh
- ./data/postgres:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: {{ YOUR POSTGRES PASS. SAME AS IN dendrite.yaml }}
POSTGRES_USER: dendrite
healthcheck:
test: ["CMD-SHELL", "pg_isready -U dendrite"]
interval: 5s
timeout: 5s
retries: 5
networks:
- matrix-internal
dendrite:
depends_on:
- postgres
hostname: dendrite
image: matrixdotorg/dendrite-monolith:latest
command: [
"--tls-cert=server.crt",
"--tls-key=server.key"
]
ports:
- 8008:8008
- 8448:8448
volumes:
- ./config/dendrite:/etc/dendrite
- ./data/dendrite/media:/var/dendrite/media
depends_on:
- postgres
networks:
- matrix-internal
restart: unless-stopped
labels:
- traefik.enable=true
- traefik.http.routers.web.rule=Host(`matrix.{{tailnet-id}}.ts.net`)
- traefik.http.routers.web.tls.certresolver=tailscaleResolver
- traefik.http.routers.web.tls.domains[0].main=matrix.{{tailnet-id}}.ts.net
- traefik.http.routers.web.entrypoints=client
- traefik.http.routers.federation.rule=Host(`matrix.{{tailnet-id}}.ts.net`)
- traefik.http.routers.federation.tls.certresolver=tailscaleResolver
- traefik.http.routers.federation.tls.domains[0].main=matrix.{{tailnet-id}}.ts.net
- traefik.http.routers.federation.entrypoints=federation
```
The dendrite config should be configured as required based on the
[Dendrite docs](https://github.com/matrix-org/dendrite/tree/main/build/docker). However,
the Tailscale environment config file was a little harder to figure out. In this one I
ended with something along the following lines:
```
TS_HOSTNAME='matrix'
TS_AUTH_KEY={{ Authkey from the Tailscale admin console }}
TS_STATE_DIR='/var/lib/tailscale'
TS_USERSPACE=false
TS_SOCKET='/var/run/tailscale/tailscaled.sock'
TS_DEST_IP='172.25.0.10'
```
## Word of Caution
The Tailscale container is highly privileged, so make sure to put on some real monitoring.