1
0
Fork 0
mirror of https://github.com/arangodb/kube-arangodb.git synced 2024-12-14 11:57:37 +00:00

(Documentation) Move documentation from ArangoDB site into this repo (#1450)

- remove duplicated docs
- update old docs with new info
- rework docs index page
- file names not changed to make sure redirects from old site will work as expected

Co-authored-by: jwierzbo <jakub.wierzbowski@arangodb.com>
This commit is contained in:
Nikita Vaniasin 2023-10-19 15:47:42 +02:00 committed by GitHub
parent b9918115d9
commit fe66d98444
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
28 changed files with 3808 additions and 34 deletions

View file

@ -5,6 +5,7 @@
- (Improvement) Print assigned node name to log and condition message when pod is scheduled
- (Maintenance) Remove obsolete docs, restructure for better UX, generate index files
- (Feature) Add `spec.upgrade.debugLog` option to configure upgrade container logging
- (Documentation) Move documentation from ArangoDB into this repo, update and improve structure
## [1.2.34](https://github.com/arangodb/kube-arangodb/tree/1.2.34) (2023-10-16)
- (Bugfix) Fix make manifests-crd-file command

View file

@ -6,13 +6,13 @@ ArangoDB Kubernetes Operator helps to run ArangoDB deployments
on Kubernetes clusters.
To get started, follow the Installation instructions below and/or
read the [tutorial](https://www.arangodb.com/docs/stable/deployment-kubernetes-usage.html).
read the [tutorial](docs/using-the-operator.md).
## State
The ArangoDB Kubernetes Operator is Production ready.
[Documentation](https://www.arangodb.com/docs/stable/deployment-kubernetes.html)
[Documentation](docs/README.md)
### Limits

View file

@ -1,11 +1,57 @@
# ArangoDB Kubernetes Operator
- [Tutorial](https://www.arangodb.com/docs/stable/tutorials-kubernetes.html)
- [Documentation](https://www.arangodb.com/docs/stable/deployment-kubernetes.html)
- [Architecture](./design/README.md)
- [Features description and usage](./features/README.md)
- [Custom Resources API Reference](./api/README.md)
- [Operator Metrics & Alerts](./generated/metrics/README.md)
- [Operator Actions](./generated/actions.md)
- [Intro](#intro)
- [Using the ArangoDB Kubernetes Operator](using-the-operator.md)
- [Architecture overview](design/README.md)
- [Features description and usage](features/README.md)
- [Custom Resources API Reference](api/README.md)
- [Operator Metrics & Alerts](generated/metrics/README.md)
- [Operator Actions](generated/actions.md)
- [Authentication](authentication.md)
- Custom resources overview:
- [ArangoDeployment](deployment-resource-reference.md)
- [ArangoDeploymentReplication](deployment-replication-resource-reference.md)
- [ArangoLocalStorage](storage-resource.md)
- [Backup](backup-resource.md)
- [BackupPolicy](backuppolicy-resource.md)
- [Configuration and secrets](configuration-and-secrets.md)
- [Configuring your driver for ArangoDB access](driver-configuration.md)
- [Using Helm](helm.md)
- [Collecting metrics](metrics.md)
- [Services & Load balancer](services-and-load-balancer.md)
- [Storage configuration](storage.md)
- [Secure connections (TLS)](tls.md)
- [Upgrading ArangoDB version](upgrading.md)
- [Scaling your ArangoDB deployment](scaling.md)
- [Draining the Kubernetes nodes](draining-nodes.md)
- Known issues (TBD)
- [Troubleshooting](troubleshooting.md)
- [How-to ...](how-to/README.md)
## Intro
The ArangoDB Kubernetes Operator (`kube-arangodb`) is a set of operators
that you deploy in your Kubernetes cluster to:
- Manage deployments of the ArangoDB database
- Manage backups
- Provide `PersistentVolumes` on local storage of your nodes for optimal storage performance.
- Configure ArangoDB Datacenter-to-Datacenter Replication
Each of these uses involves a different custom resource.
- Use an [`ArangoDeployment` resource](deployment-resource-reference.md) to
create an ArangoDB database deployment.
- Use an [`ArangoBackup`](backup-resource.md) and `ArangoBackupPolicy` resources to
create ArangoDB backups.
- Use an [`ArangoLocalStorage` resource](storage-resource.md) to
provide local `PersistentVolumes` for optimal I/O performance.
- Use an [`ArangoDeploymentReplication` resource](deployment-replication-resource-reference.md) to
configure ArangoDB Datacenter-to-Datacenter Replication.
Continue with [Using the ArangoDB Kubernetes Operator](using-the-operator.md)
to learn how to install the ArangoDB Kubernetes operator and create
your first deployment.
For more information about the production readiness state, please refer to the
[ArangoDB Kubernetes Operator repository](https://github.com/arangodb/kube-arangodb#production-readiness-state).

18
docs/authentication.md Normal file
View file

@ -0,0 +1,18 @@
# Authentication
The ArangoDB Kubernetes Operator will by default create ArangoDB deployments
that require authentication to access the database.
It uses a single JWT secret (stored in a Kubernetes secret)
to provide *super-user* access between all servers of the deployment
as well as access from the ArangoDB Operator to the deployment.
To disable authentication, set `spec.auth.jwtSecretName` to `None`.
Initially the deployment is accessible through the web user-interface and
APIs, using the user `root` with an empty password.
Make sure to change this password immediately after starting the deployment!
## See also
- [Secure connections (TLS)](tls.md)

554
docs/backup-resource.md Normal file
View file

@ -0,0 +1,554 @@
# ArangoBackup Custom Resource
The ArangoBackup Operator creates and maintains ArangoBackups
in a Kubernetes cluster, given a Backup specification.
This deployment specification is a `CustomResource` following
a `CustomResourceDefinition` created by the operator.
## Examples:
### Create simple Backup
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackup"
metadata:
name: "example-arangodb-backup"
namespace: "arangodb"
spec:
deployment:
name: "my-deployment"
```
Action:
Create Backup on ArangoDeployment named `my-deployment`
### Create and upload Backup
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackup"
metadata:
name: "example-arangodb-backup"
namespace: "arangodb"
spec:
deployment:
name: "my-deployment"
upload:
repositoryURL: "S3://test/kube-test"
credentialsSecretName: "my-s3-rclone-credentials"
```
Action:
Create Backup on ArangoDeployment named `my-deployment` and upload it to `S3://test/kube-test`.
### Download Backup
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackup"
metadata:
name: "example-arangodb-backup"
namespace: "arangodb"
spec:
deployment:
name: "my-deployment"
download:
repositoryURL: "S3://test/kube-test"
credentialsSecretName: "my-s3-rclone-credentials"
id: "backup-id"
```
Download Backup with id `backup-id` from `S3://test/kube-test` on ArangoDeployment named `my-deployment`
### Restore
Information about restoring can be found in [ArangoDeployment](deployment-resource-reference.md).
## Advertised fields
List of custom columns in CRD specification for Kubectl:
- `.spec.policyName` - optional name of the policy
- `.spec.deployment.name` - name of the deployment
- `.status.state` - current ArangoBackup Custom Resource state
- `.status.message` - additional message for current state
## ArangoBackup Custom Resource Spec:
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackup"
metadata:
name: "example-arangodb-backup"
namespace: "arangodb"
spec:
policyName: "my-policy"
deployment:
name: "my-deployment"
options:
timeout: 3
force: true
download:
repositoryURL: "s3:/..."
credentialsSecretName: "secret-name"
id: "backup-id"
upload:
repositoryURL: "s3:/..."
credentialsSecretName: "secret-name"
status:
state: "Ready"
time: "time"
message: "Message details" -
progress:
jobID: "id"
progress: "10%"
backup:
id: "id"
version: "3.9.0-dev"
forced: true
uploaded: true
downloaded: true
createdAt: "time"
sizeInBytes: 1
numberOfDBServers: 3
available: true
```
## `spec: Object`
Spec of the ArangoBackup Custom Resource.
Required: true
Default: {}
### `spec.deployment: Object`
ArangoDeployment specification.
Field is immutable.
Required: true
Default: {}
#### `spec.deployment.name: String`
Name of the ArangoDeployment Custom Resource within same namespace as ArangoBackup Custom Resource.
Field is immutable.
Required: true
Default: ""
#### `spec.policyName: String`
Name of the ArangoBackupPolicy which created this Custom Resource
Field is immutable.
Required: false
Default: ""
### `spec.options: Object`
Backup options.
Field is immutable.
Required: false
Default: {}
#### `spec.options.timeout: float`
Timeout for Backup creation request in seconds.
Field is immutable.
Required: false
Default: 30
#### `spec.options.allowInconsistent: bool`
AllowInconsistent flag for Backup creation request.
If this value is set to true, backup is taken even if we are not able to acquire lock.
Field is immutable.
Required: false
Default: false
### `spec.download: Object`
Backup download settings.
Field is immutable.
Required: false
Default: {}
#### `spec.download.repositoryURL: string`
Field is immutable. Protocol needs to be defined in `spec.download.credentialsSecretName` if protocol is other than local.
Mode protocols can be found at [rclone.org](https://rclone.org/).
Format: `<protocol>:/<path>`
Examples:
- `s3://my-bucket/test`
- `azure://test`
Required: true
Default: ""
#### `spec.download.credentialsSecretName: string`
Field is immutable. Name of the secret used while accessing repository
Secret structure:
```yaml
apiVersion: v1
data:
token: <json token>
kind: Secret
metadata:
name: <name>
type: Opaque
```
`JSON Token` options are described on the [rclone](https://rclone.org/) page.
We can define more than one protocols at same time in one secret.
This field is defined in json format:
```json
{
"<protocol>": {
"type":"<type>",
...parameters
}
}
```
AWS S3 example - based on [rclone S3](https://rclone.org/s3/) documentation and interactive process:
```json
{
"S3": {
"type": "s3", # Choose s3 type
"provider": "AWS", # Choose one of the providers
"env_auth": "false", # Define credentials in next step instead of using ENV
"access_key_id": "xxx",
"secret_access_key": "xxx",
"region": "eu-west-2", # Choose region
"acl": "private", # Set permissions on newly created remote object
}
}
```
and you can from now use `S3://bucket/path`.
Required: false
Default: ""
##### Use IAM with Amazon EKS
Instead of creating and distributing your AWS credentials to the containers or
using the Amazon EC2 instance's role, you can associate an IAM role with a
Kubernetes service account and configure pods to use the service account.
1. Create a Policy to access the S3 bucket.
```bash
aws iam create-policy \
--policy-name S3-ACCESS_ROLE \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "*",
"Resource": "arn:aws:s3:::MY_BUCKET"
},
{
"Effect": "Allow",
"Action": "*",
"Resource": "arn:aws:s3:::MY_BUCKET/*"
}
]
}'
```
2. Create an IAM role for the service account (SA).
```bash
eksctl create iamserviceaccount \
--name SA_NAME \
--namespace NAMESPACE \
--cluster CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::ACCOUNT_ID:policy/S3-ACCESS_ROLE \
--approve
```
3. Ensure that you use that SA in your ArangoDeployment for `dbservers` and
`coordinators`.
```yaml
apiVersion: database.arangodb.com/v1
kind: ArangoDeployment
metadata:
name: cluster
spec:
image: arangodb/enterprise
mode: Cluster
dbservers:
serviceAccountName: SA_NAME
coordinators:
serviceAccountName: SA_NAME
```
4. Create a `Secret` Kubernetes object with a configuration for S3.
```yaml
apiVersion: v1
kind: Secret
metadata:
name: arangodb-cluster-backup-credentials
type: Opaque
stringData:
token: |
{
"s3": {
"type": "s3",
"provider": "AWS",
"env_auth": "true",
"location_constraint": "eu-central-1",
"region": "eu-central-1",
"acl": "private",
"no_check_bucket": "true"
}
}
```
5. Create an `ArangoBackup` Kubernetes object with upload to S3.
```yaml
apiVersion: "backup.arangodb.com/v1alpha"
kind: "ArangoBackup"
metadata:
name: backup
spec:
deployment:
name: MY_DEPLOYMENT
upload:
repositoryURL: "s3:MY_BUCKET"
credentialsSecretName: arangodb-cluster-backup-credentials
```
#### `spec.download.id: string`
ID of the ArangoBackup to be downloaded.
Field is immutable.
Required: true
Default: ""
### `spec.upload: Object`
Backup upload settings.
This field can be removed and created again with different values. This operation will trigger upload again.
Fields in Custom Resource Spec Upload are immutable.
Required: false
Default: {}
#### `spec.upload.repositoryURL: string`
Same structure as `spec.download.repositoryURL`.
Required: true
Default: ""
#### `spec.upload.credentialsSecretName: string`
Same structure as `spec.download.credentialsSecretName`.
Required: false
Default: ""
## `status: Object`
Status of the ArangoBackup Custom Resource. This field is managed by subresource and only by operator
Required: true
Default: {}
### `status.state: enum`
State of the ArangoBackup object.
Required: true
Default: ""
Possible states:
- "" - default state, changed to "Pending"
- "Pending" - state in which Custom Resource is queued. If Backup is possible changed to "Scheduled"
- "Scheduled" - state which will start create/download process
- "Download" - state in which download request will be created on ArangoDB
- "DownloadError" - state when download failed
- "Downloading" - state for downloading progress
- "Create" - state for creation, field available set to true
- "Upload" - state in which upload request will be created on ArangoDB
- "Uploading" - state for uploading progress
- "UploadError" - state when uploading failed
- "Ready" - state when Backup is finished
- "Deleted" - state when Backup was once in ready, but has been deleted
- "Failed" - state for failure
- "Unavailable" - state when Backup is not available on the ArangoDB. It can happen in case of upgrades, node restarts etc.
### `status.time: timestamp`
Time in UTC when state of the ArangoBackup Custom Resource changed.
Required: true
Default: ""
### `status.message: string`
State message of the ArangoBackup Custom Resource.
Required: false
Default: ""
### `status.progress: object`
Progress info of the uploading and downloading process.
Required: false
Default: {}
#### `status.progress.jobID: string`
ArangoDB job ID for uploading or downloading.
Required: true
Default: ""
#### `status.progress.progress: string`
ArangoDeployment job progress.
Required: true
Default: "0%"
### `status.backup: object`
ArangoBackup details.
Required: true
Default: {}
#### `status.backup.id: string`
ArangoBackup ID.
Required: true
Default: ""
#### `status.backup.version: string`
ArangoBackup version.
Required: true
Default: ""
#### `status.backup.potentiallyInconsistent: bool`
ArangoBackup potentially inconsistent flag.
Required: false
Default: false
#### `status.backup.uploaded: bool`
Determines if ArangoBackup has been uploaded.
Required: false
Default: false
#### `status.backup.downloaded: bool`
Determines if ArangoBackup has been downloaded.
Required: false
Default: false
#### `status.backup.createdAt: TimeStamp`
ArangoBackup Custom Resource creation time in UTC.
Required: true
Default: now()
#### `status.backup.sizeInBytes: uint64`
Size of the Backup in ArangoDB.
Required: true
Default: 0
#### `status.backup.numberOfDBServers: uint`
Cluster size of the Backup in ArangoDB.
Required: true
Default: 0
### `status.available: bool`
Determines if we can restore from ArangoBackup.
Required: true
Default: false

View file

@ -0,0 +1,185 @@
# ArangoBackupPolicy Custom Resource
The ArangoBackupPolicy represents schedule definition for creating ArangoBackup Custom Resources by operator.
This deployment specification is a `CustomResource` following a `CustomResourceDefinition` created by the operator.
## Examples
### Create schedule for all deployments
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15 minutes.
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackupPolicy"
metadata:
name: "example-arangodb-backup-policy"
spec:
schedule: "*/15 * * * *"
```
### Create schedule for selected deployments
You can create an ArangoBackup Custom Resource for selected ArangoBackups every 15 minutes.
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackupPolicy"
metadata:
name: "example-arangodb-backup-policy"
spec:
schedule: "*/15 * * * *"
selector:
matchLabels:
labelName: "labelValue"
```
### Create schedule for all deployments and upload
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15
minutes and upload it to the specified repositoryURL.
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackupPolicy"
metadata:
name: "example-arangodb-backup-policy"
spec:
schedule: "*/15 * * * * "
template:
upload:
repositoryURL: "s3:/..."
credentialsSecretName: "secret-name"
```
### Create schedule for all deployments, don't allow parallel backup runs, keep limited number of backups
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15
minutes. You can keep 10 backups per deployment at the same time, and delete the
oldest ones. Don't allow to run backup if previous backup is not finished.
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackupPolicy"
metadata:
name: "example-arangodb-backup-policy"
spec:
schedule: "*/15 * * * *"
maxBackups: 10
allowConcurrent: False
```
## ArangoBackup Custom Resource Spec
```yaml
apiVersion: "backup.arangodb.com/v1"
kind: "ArangoBackupPolicy"
metadata:
name: "example-arangodb-backup-policy"
spec:
schedule: "*/15 * * * * "
selector:
matchLabels:
labelName: "labelValue"
matchExpressions: []
template:
options:
timeout: 3
force: true
upload:
repositoryURL: "s3:/..."
credentialsSecretName: "secret-name"
status:
scheduled: "time"
message: "message"
```
## `spec: Object`
Spec of the ArangoBackupPolicy Custom Resource
Required: true
Default: {}
### `spec.schedule: String`
Schedule definition. Parser from https://godoc.org/github.com/robfig/cron
Required: true
Default: ""
### `spec.allowConcurrent: String`
If false, ArangoBackup will not be created when previous backups are not finished.
`ScheduleSkipped` event will be published in that case.
Required: false
Default: True
### `spec.maxBackups: Integer`
If > 0, then old healthy backups of that policy will be removed to ensure that only `maxBackups` are present at same time.
`CleanedUpOldBackups` event will be published on automatic removal of old backups.
Required: false
Default: 0
### `spec.selector: Object`
Selector definition for selecting matching ArangoBackup Custom Resources. Parser from https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#LabelSelector
Required: false
Default: {}
### `spec.template: ArangoBackupTemplate`
Template for the ArangoBackup Custom Resource
Required: false
Default: {}
### `spec.template.options: ArangoBackup - spec.options`
ArangoBackup options
Required: false
Default: {}
### `spec.template.upload: ArangoBackup - spec.upload`
ArangoBackup upload configuration
Required: false
Default: {}
## `status: Object`
Status of the ArangoBackupPolicy Custom Resource managed by operator
Required: true
Default: {}
### `status.scheduled: TimeStamp`
Next scheduled time in UTC
Required: true
Default: ""
### `status.message: String`
Message from the operator in case of failure - schedule not valid, ArangoBackupPolicy not valid
Required: false
Default: ""

View file

@ -0,0 +1,36 @@
# Configuration & secrets
An ArangoDB cluster has lots of configuration options.
Some will be supported directly in the ArangoDB Operator,
others will have to specified separately.
## Passing command line options
All command-line options of `arangod` (and `arangosync`) are available
by adding options to the `spec.<group>.args` list of a group
of servers.
These arguments are added to the command-line created for these servers.
## Secrets
The ArangoDB cluster needs several secrets such as JWT tokens
TLS certificates and so on.
All these secrets are stored as Kubernetes Secrets and passed to
the applicable Pods as files, mapped into the Pods filesystem.
The name of the secret is specified in the custom resource.
For example:
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "example-simple-cluster"
spec:
mode: Cluster
image: 'arangodb/arangodb:3.10.8'
auth:
jwtSecretName: <name-of-JWT-token-secret>
```

View file

@ -1,3 +1,3 @@
# Deployment Operator Dashboard
# Deployment Operator Dashboards
### Dashboard UI now is deprecated and will be removed in next minor version

View file

@ -0,0 +1,334 @@
# ArangoDeploymentReplication Custom Resource
#### Enterprise Edition only
The ArangoDB Replication Operator creates and maintains ArangoDB
`arangosync` configurations in a Kubernetes cluster, given a replication specification.
This replication specification is a `CustomResource` following
a `CustomResourceDefinition` created by the operator.
Example of a minimal replication definition for two ArangoDB clusters with
sync in the same Kubernetes cluster:
```yaml
apiVersion: "replication.database.arangodb.com/v1"
kind: "ArangoDeploymentReplication"
metadata:
name: "replication-from-a-to-b"
spec:
source:
deploymentName: cluster-a
auth:
keyfileSecretName: cluster-a-sync-auth
destination:
deploymentName: cluster-b
```
This definition results in:
- the arangosync `SyncMaster` in deployment `cluster-b` is called to configure a synchronization
from the syncmasters in `cluster-a` to the syncmasters in `cluster-b`,
using the client authentication certificate stored in `Secret` `cluster-a-sync-auth`.
To access `cluster-a`, the JWT secret found in the deployment of `cluster-a` is used.
To access `cluster-b`, the JWT secret found in the deployment of `cluster-b` is used.
Example replication definition for replicating from a source that is outside the current Kubernetes cluster
to a destination that is in the same Kubernetes cluster:
```yaml
apiVersion: "replication.database.arangodb.com/v1"
kind: "ArangoDeploymentReplication"
metadata:
name: "replication-from-a-to-b"
spec:
source:
masterEndpoint: ["https://163.172.149.229:31888", "https://51.15.225.110:31888", "https://51.15.229.133:31888"]
auth:
keyfileSecretName: cluster-a-sync-auth
tls:
caSecretName: cluster-a-sync-ca
destination:
deploymentName: cluster-b
```
This definition results in:
- the arangosync `SyncMaster` in deployment `cluster-b` is called to configure a synchronization
from the syncmasters located at the given list of endpoint URLs to the syncmasters `cluster-b`,
using the client authentication certificate stored in `Secret` `cluster-a-sync-auth`.
To access `cluster-a`, the keyfile (containing a client authentication certificate) is used.
To access `cluster-b`, the JWT secret found in the deployment of `cluster-b` is used.
## DC2DC Replication Example
The requirements for setting up Datacenter-to-Datacenter (DC2DC) Replication are:
- You need to have two ArangoDB clusters running in two different Kubernetes clusters.
- Both Kubernetes clusters are equipped with support for `Services` of type `LoadBalancer`.
- You can create (global) DNS names for configured `Services` with low propagation times. E.g. use Cloudflare.
- You have 4 DNS names available:
- One for the database in the source ArangoDB cluster, e.g. `src-db.mycompany.com`
- One for the ArangoDB syncmasters in the source ArangoDB cluster, e.g. `src-sync.mycompany.com`
- One for the database in the destination ArangoDB cluster, e.g. `dst-db.mycompany.com`
- One for the ArangoDB syncmasters in the destination ArangoDB cluster, e.g. `dst-sync.mycompany.com`
Follow these steps to configure DC2DC replication between two ArangoDB clusters
running in Kubernetes:
1. Enable DC2DC Replication support on the source ArangoDB cluster.
Set your current Kubernetes context to the Kubernetes source cluster.
Edit the `ArangoDeployment` of the source ArangoDB clusters:
- Set `spec.tls.altNames` to `["src-db.mycompany.com"]` (can include more names / IP addresses)
- Set `spec.sync.enabled` to `true`
- Set `spec.sync.externalAccess.masterEndpoint` to `["https://src-sync.mycompany.com:8629"]`
- Set `spec.sync.externalAccess.accessPackageSecretNames` to `["src-accesspackage"]`
2. Extract the access package from the source ArangoDB cluster.
```bash
kubectl get secret src-accesspackage --template='{{index .data "accessPackage.yaml"}}' | \
base64 -D > accessPackage.yaml
```
3. Configure the source DNS names.
```bash
kubectl get service
```
Find the IP address contained in the `LoadBalancer` column for the following `Services`:
- `<deployment-name>-ea` Use this IP address for the `src-db.mycompany.com` DNS name.
- `<deployment-name>-sync` Use this IP address for the `src-sync.mycompany.com` DNS name.
The process for configuring DNS names is specific to each DNS provider.
Set your current Kubernetes context to the Kubernetes destination cluster.
Edit the `ArangoDeployment` of the source ArangoDB clusters:
- Set `spec.tls.altNames` to `["dst-db.mycompany.com"]` (can include more names / IP addresses)
- Set `spec.sync.enabled` to `true`
- Set `spec.sync.externalAccess.masterEndpoint` to `["https://dst-sync.mycompany.com:8629"]`
4. Enable DC2DC Replication support on the destination ArangoDB cluster.
5. Import the access package in the destination cluster.
```bash
kubectl apply -f accessPackage.yaml
```
Note: This imports two `Secrets`, containing TLS information about the source
cluster, into the destination cluster.
6. Configure the destination DNS names.
```bash
kubectl get service
```
Find the IP address contained in the `LoadBalancer` column for the following `Services`:
- `<deployment-name>-ea` Use this IP address for the `dst-db.mycompany.com` DNS name.
- `<deployment-name>-sync` Use this IP address for the `dst-sync.mycompany.com` DNS name.
The process for configuring DNS names is specific to each DNS provider.
7. Create an `ArangoDeploymentReplication` resource.
Create a yaml file (e.g. called `src-to-dst-repl.yaml`) with the following content:
```yaml
apiVersion: "replication.database.arangodb.com/v1"
kind: "ArangoDeploymentReplication"
metadata:
name: "replication-src-to-dst"
spec:
source:
masterEndpoint: ["https://src-sync.mycompany.com:8629"]
auth:
keyfileSecretName: src-accesspackage-auth
tls:
caSecretName: src-accesspackage-ca
destination:
deploymentName: <dst-deployment-name>
```
8. Wait for the DNS names to propagate.
Wait until the DNS names configured in step 3 and 6 resolve to their configured
IP addresses.
Depending on your DNS provides this can take a few minutes up to 24 hours.
9. Activate the replication.
```bash
kubectl apply -f src-to-dst-repl.yaml
```
Replication from the source cluster to the destination cluster will now be configured.
Check the status of the replication by inspecting the status of the
`ArangoDeploymentReplication` resource using:
```bash
kubectl describe ArangoDeploymentReplication replication-src-to-dst
```
As soon as the replication is configured, the `Add collection` button in the `Collections`
page of the web interface (of the destination cluster) will be grayed out.
## Specification reference
Below you'll find all settings of the `ArangoDeploymentReplication` custom resource.
### `spec.source.deploymentName: string`
This setting specifies the name of an `ArangoDeployment` resource that runs a cluster
with sync enabled.
This cluster configured as the replication source.
### `spec.source.masterEndpoint: []string`
This setting specifies zero or more master endpoint URLs of the source cluster.
Use this setting if the source cluster is not running inside a Kubernetes cluster
that is reachable from the Kubernetes cluster the `ArangoDeploymentReplication` resource is deployed in.
Specifying this setting and `spec.source.deploymentName` at the same time is not allowed.
### `spec.source.auth.keyfileSecretName: string`
This setting specifies the name of a `Secret` containing a client authentication certificate called `tls.keyfile` used to authenticate
with the SyncMaster at the specified source.
If `spec.source.auth.userSecretName` has not been set,
the client authentication certificate found in the secret with this name is also used to configure
the synchronization and fetch the synchronization status.
This setting is required.
### `spec.source.auth.userSecretName: string`
This setting specifies the name of a `Secret` containing a `username` & `password` used to authenticate
with the SyncMaster at the specified source in order to configure synchronization and fetch synchronization status.
The user identified by the username must have write access in the `_system` database of the source ArangoDB cluster.
### `spec.source.tls.caSecretName: string`
This setting specifies the name of a `Secret` containing a TLS CA certificate `ca.crt` used to verify
the TLS connection created by the SyncMaster at the specified source.
This setting is required, unless `spec.source.deploymentName` has been set.
### `spec.destination.deploymentName: string`
This setting specifies the name of an `ArangoDeployment` resource that runs a cluster
with sync enabled.
This cluster configured as the replication destination.
### `spec.destination.masterEndpoint: []string`
This setting specifies zero or more master endpoint URLs of the destination cluster.
Use this setting if the destination cluster is not running inside a Kubernetes cluster
that is reachable from the Kubernetes cluster the `ArangoDeploymentReplication` resource is deployed in.
Specifying this setting and `spec.destination.deploymentName` at the same time is not allowed.
### `spec.destination.auth.keyfileSecretName: string`
This setting specifies the name of a `Secret` containing a client authentication certificate called `tls.keyfile` used to authenticate
with the SyncMaster at the specified destination.
If `spec.destination.auth.userSecretName` has not been set,
the client authentication certificate found in the secret with this name is also used to configure
the synchronization and fetch the synchronization status.
This setting is required, unless `spec.destination.deploymentName` or `spec.destination.auth.userSecretName` has been set.
Specifying this setting and `spec.destination.userSecretName` at the same time is not allowed.
### `spec.destination.auth.userSecretName: string`
This setting specifies the name of a `Secret` containing a `username` & `password` used to authenticate
with the SyncMaster at the specified destination in order to configure synchronization and fetch synchronization status.
The user identified by the username must have write access in the `_system` database of the destination ArangoDB cluster.
Specifying this setting and `spec.destination.keyfileSecretName` at the same time is not allowed.
### `spec.destination.tls.caSecretName: string`
This setting specifies the name of a `Secret` containing a TLS CA certificate `ca.crt` used to verify
the TLS connection created by the SyncMaster at the specified destination.
This setting is required, unless `spec.destination.deploymentName` has been set.
## Authentication details
The authentication settings in a `ArangoDeploymentReplication` resource are used for two distinct purposes.
The first use is the authentication of the syncmasters at the destination with the syncmasters at the source.
This is always done using a client authentication certificate which is found in a `tls.keyfile` field
in a secret identified by `spec.source.auth.keyfileSecretName`.
The second use is the authentication of the ArangoDB Replication operator with the syncmasters at the source
or destination. These connections are made to configure synchronization, stop configuration and fetch the status
of the configuration.
The method used for this authentication is derived as follows (where `X` is either `source` or `destination`):
- If `spec.X.userSecretName` is set, the username + password found in the `Secret` identified by this name is used.
- If `spec.X.keyfileSecretName` is set, the client authentication certificate (keyfile) found in the `Secret` identifier by this name is used.
- If `spec.X.deploymentName` is set, the JWT secret found in the deployment is used.
## Creating client authentication certificate keyfiles
The client authentication certificates needed for the `Secrets` identified by `spec.source.auth.keyfileSecretName` & `spec.destination.auth.keyfileSecretName`
are normal ArangoDB keyfiles that can be created by the `arangosync create client-auth keyfile` command.
In order to do so, you must have access to the client authentication CA of the source/destination.
If the client authentication CA at the source/destination also contains a private key (`ca.key`), the ArangoDeployment operator
can be used to create such a keyfile for you, without the need to have `arangosync` installed locally.
Read the following paragraphs for instructions on how to do that.
## Creating and using access packages
An access package is a YAML file that contains:
- A client authentication certificate, wrapped in a `Secret` in a `tls.keyfile` data field.
- A TLS certificate authority public key, wrapped in a `Secret` in a `ca.crt` data field.
The format of the access package is such that it can be inserted into a Kubernetes cluster using the standard `kubectl` tool.
To create an access package that can be used to authenticate with the ArangoDB SyncMasters of an `ArangoDeployment`,
add a name of a non-existing `Secret` to the `spec.sync.externalAccess.accessPackageSecretNames` field of the `ArangoDeployment`.
In response, a `Secret` is created in that Kubernetes cluster, with the given name, that contains a `accessPackage.yaml` data field
that contains a Kubernetes resource specification that can be inserted into the other Kubernetes cluster.
The process for creating and using an access package for authentication at the source cluster is as follows:
- Edit the `ArangoDeployment` resource of the source cluster, set `spec.sync.externalAccess.accessPackageSecretNames` to `["my-access-package"]`
- Wait for the `ArangoDeployment` operator to create a `Secret` named `my-access-package`.
- Extract the access package from the Kubernetes source cluster using:
```bash
kubectl get secret my-access-package --template='{{index .data "accessPackage.yaml"}}' | base64 -D > accessPackage.yaml
```
- Insert the secrets found in the access package in the Kubernetes destination cluster using:
```bash
kubectl apply -f accessPackage.yaml
```
As a result, the destination Kubernetes cluster will have 2 additional `Secrets`. One contains a client authentication certificate
formatted as a keyfile. Another contains the public key of the TLS CA certificate of the source cluster.

View file

@ -0,0 +1,840 @@
# ArangoDeployment Custom Resource
The ArangoDB Deployment Operator creates and maintains ArangoDB deployments
in a Kubernetes cluster, given a deployment specification.
This deployment specification is a `CustomResource` following
a `CustomResourceDefinition` created by the operator.
Example minimal deployment definition of an ArangoDB database cluster:
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "example-arangodb-cluster"
spec:
mode: Cluster
```
Example more elaborate deployment definition:
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "example-arangodb-cluster"
spec:
mode: Cluster
environment: Production
agents:
count: 3
args:
- --log.level=debug
resources:
requests:
storage: 8Gi
storageClassName: ssd
dbservers:
count: 5
resources:
requests:
storage: 80Gi
storageClassName: ssd
coordinators:
count: 3
image: "arangodb/arangodb:3.9.3"
```
## Specification reference
Below you'll find all settings of the `ArangoDeployment` custom resource.
Several settings are for various groups of servers. These are indicated
with `<group>` where `<group>` can be any of:
- `agents` for all Agents of a `Cluster` or `ActiveFailover` pair.
- `dbservers` for all DB-Servers of a `Cluster`.
- `coordinators` for all Coordinators of a `Cluster`.
- `single` for all single servers of a `Single` instance or `ActiveFailover` pair.
- `syncmasters` for all syncmasters of a `Cluster`.
- `syncworkers` for all syncworkers of a `Cluster`.
Special group `id` can be used for image discovery and testing affinity/toleration settings.
### `spec.architecture: []string`
This setting specifies a CPU architecture for the deployment.
Possible values are:
- `amd64` (default): Use processors with the x86-64 architecture.
- `arm64`: Use processors with the 64-bit ARM architecture.
The setting expects a list of strings, but you should only specify a single
list item for the architecture, except when you want to migrate from one
architecture to the other. The first list item defines the new default
architecture for the deployment that you want to migrate to.
_Tip:_
To use the ARM architecture, you need to enable it in the operator first using
`--set "operator.architectures={amd64,arm64}"`. See
[Installation with Helm](using-the-operator.md#installation-with-helm).
To create a new deployment with `arm64` nodes, specify the architecture in the
deployment specification as follows:
```yaml
spec:
architecture:
- arm64
```
To migrate nodes of an existing deployment from `amd64` to `arm64`, modify the
deployment specification so that both architectures are listed:
```diff
spec:
architecture:
+ - arm64
- amd64
```
This lets new members as well as recreated members use `arm64` nodes.
Then run the following command:
```bash
kubectl annotate pod $POD "deployment.arangodb.com/replace=true"
```
To change an existing member to `arm64`, annotate the pod as follows:
```bash
kubectl annotate pod $POD "deployment.arangodb.com/arch=arm64"
```
An `ArchitectureMismatch` condition occurs in the deployment:
```yaml
members:
single:
- arango-version: 3.10.0
architecture: arm64
conditions:
reason: Member has a different architecture than the deployment
status: "True"
type: ArchitectureMismatch
```
Restart the pod using this command:
```bash
kubectl annotate pod $POD "deployment.arangodb.com/rotate=true"
```
### `spec.mode: string`
This setting specifies the type of deployment you want to create.
Possible values are:
- `Cluster` (default) Full cluster. Defaults to 3 Agents, 3 DB-Servers & 3 Coordinators.
- `ActiveFailover` Active-failover single pair. Defaults to 3 Agents and 2 single servers.
- `Single` Single server only (note this does not provide high availability or reliability).
This setting cannot be changed after the deployment has been created.
### `spec.environment: string`
This setting specifies the type of environment in which the deployment is created.
Possible values are:
- `Development` (default) This value optimizes the deployment for development
use. It is possible to run a deployment on a small number of nodes (e.g. minikube).
- `Production` This value optimizes the deployment for production use.
It puts required affinity constraints on all pods to avoid Agents & DB-Servers
from running on the same machine.
### `spec.image: string`
This setting specifies the docker image to use for all ArangoDB servers.
In a `development` environment this setting defaults to `arangodb/arangodb:latest`.
For `production` environments this is a required setting without a default value.
It is highly recommend to use explicit version (not `latest`) for production
environments.
### `spec.imagePullPolicy: string`
This setting specifies the pull policy for the docker image to use for all ArangoDB servers.
Possible values are:
- `IfNotPresent` (default) to pull only when the image is not found on the node.
- `Always` to always pull the image before using it.
### `spec.imagePullSecrets: []string`
This setting specifies the list of image pull secrets for the docker image to use for all ArangoDB servers.
### `spec.annotations: map[string]string`
This setting set specified annotations to all ArangoDeployment owned resources (pods, services, PVC's, PDB's).
### `spec.storageEngine: string`
This setting specifies the type of storage engine used for all servers
in the cluster.
Possible values are:
- `MMFiles` To use the MMFiles storage engine.
- `RocksDB` (default) To use the RocksDB storage engine.
This setting cannot be changed after the cluster has been created.
### `spec.downtimeAllowed: bool`
This setting is used to allow automatic reconciliation actions that yield
some downtime of the ArangoDB deployment.
When this setting is set to `false` (the default), no automatic action that
may result in downtime is allowed.
If the need for such an action is detected, an event is added to the `ArangoDeployment`.
Once this setting is set to `true`, the automatic action is executed.
Operations that may result in downtime are:
- Rotating TLS CA certificate
Note: It is still possible that there is some downtime when the Kubernetes
cluster is down, or in a bad state, irrespective of the value of this setting.
### `spec.memberPropagationMode`
Changes to a pod's configuration require a restart of that pod in almost all
cases. Pods are restarted eagerly by default, which can cause more restarts than
desired, especially when updating _arangod_ as well as the operator.
The propagation of the configuration changes can be deferred to the next restart,
either triggered manually by the user or by another operation like an upgrade.
This reduces the number of restarts for upgrading both the server and the
operator from two to one.
- `always`: Restart the member as soon as a configuration change is discovered
- `on-restart`: Wait until the next restart to change the member configuration
### `spec.rocksdb.encryption.keySecretName`
This setting specifies the name of a Kubernetes `Secret` that contains
an encryption key used for encrypting all data stored by ArangoDB servers.
When an encryption key is used, encryption of the data in the cluster is enabled,
without it encryption is disabled.
The default value is empty.
This requires the Enterprise Edition.
The encryption key cannot be changed after the cluster has been created.
The secret specified by this setting, must have a data field named 'key' containing
an encryption key that is exactly 32 bytes long.
### `spec.networkAttachedVolumes: bool`
The default of this option is `false`. If set to `true`, a `ResignLeaderShip`
operation will be triggered when a DB-Server pod is evicted (rather than a
`CleanOutServer` operation). Furthermore, the pod will simply be
redeployed on a different node, rather than cleaned and retired and
replaced by a new member. You must only set this option to `true` if
your persistent volumes are "movable" in the sense that they can be
mounted from a different k8s node, like in the case of network attached
volumes. If your persistent volumes are tied to a specific pod, you
must leave this option on `false`.
### `spec.externalAccess.type: string`
This setting specifies the type of `Service` that will be created to provide
access to the ArangoDB deployment from outside the Kubernetes cluster.
Possible values are:
- `None` To limit access to application running inside the Kubernetes cluster.
- `LoadBalancer` To create a `Service` of type `LoadBalancer` for the ArangoDB deployment.
- `NodePort` To create a `Service` of type `NodePort` for the ArangoDB deployment.
- `Auto` (default) To create a `Service` of type `LoadBalancer` and fallback to a `Service` or type `NodePort` when the
`LoadBalancer` is not assigned an IP address.
### `spec.externalAccess.loadBalancerIP: string`
This setting specifies the IP used to for the LoadBalancer to expose the ArangoDB deployment on.
This setting is used when `spec.externalAccess.type` is set to `LoadBalancer` or `Auto`.
If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner.
### `spec.externalAccess.loadBalancerSourceRanges: []string`
If specified and supported by the platform (cloud provider), this will restrict traffic through the cloud-provider
load-balancer will be restricted to the specified client IPs. This field will be ignored if the
cloud-provider does not support the feature.
More info: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/
### `spec.externalAccess.nodePort: int`
This setting specifies the port used to expose the ArangoDB deployment on.
This setting is used when `spec.externalAccess.type` is set to `NodePort` or `Auto`.
If you do not specify this setting, a random port will be chosen automatically.
### `spec.externalAccess.advertisedEndpoint: string`
This setting specifies the advertised endpoint for all Coordinators.
### `spec.auth.jwtSecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
the JWT token used for accessing all ArangoDB servers.
When no name is specified, it defaults to `<deployment-name>-jwt`.
To disable authentication, set this value to `None`.
If you specify a name of a `Secret`, that secret must have the token
in a data field named `token`.
If you specify a name of a `Secret` that does not exist, a random token is created
and stored in a `Secret` with given name.
Changing a JWT token results in stopping the entire cluster
and restarting it.
### `spec.tls.caSecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
a standard CA certificate + private key used to sign certificates for individual
ArangoDB servers.
When no name is specified, it defaults to `<deployment-name>-ca`.
To disable authentication, set this value to `None`.
If you specify a name of a `Secret` that does not exist, a self-signed CA certificate + key is created
and stored in a `Secret` with given name.
The specified `Secret`, must contain the following data fields:
- `ca.crt` PEM encoded public key of the CA certificate
- `ca.key` PEM encoded private key of the CA certificate
### `spec.tls.altNames: []string`
This setting specifies a list of alternate names that will be added to all generated
certificates. These names can be DNS names or email addresses.
The default value is empty.
### `spec.tls.ttl: duration`
This setting specifies the time to live of all generated
server certificates.
The default value is `2160h` (about 3 month).
When the server certificate is about to expire, it will be automatically replaced
by a new one and the affected server will be restarted.
Note: The time to live of the CA certificate (when created automatically)
will be set to 10 years.
### `spec.sync.enabled: bool`
This setting enables/disables support for data center 2 data center
replication in the cluster. When enabled, the cluster will contain
a number of `syncmaster` & `syncworker` servers.
The default value is `false`.
### `spec.sync.externalAccess.type: string`
This setting specifies the type of `Service` that will be created to provide
access to the ArangoSync syncMasters from outside the Kubernetes cluster.
Possible values are:
- `None` To limit access to applications running inside the Kubernetes cluster.
- `LoadBalancer` To create a `Service` of type `LoadBalancer` for the ArangoSync SyncMasters.
- `NodePort` To create a `Service` of type `NodePort` for the ArangoSync SyncMasters.
- `Auto` (default) To create a `Service` of type `LoadBalancer` and fallback to a `Service` or type `NodePort` when the
`LoadBalancer` is not assigned an IP address.
Note that when you specify a value of `None`, a `Service` will still be created, but of type `ClusterIP`.
### `spec.sync.externalAccess.loadBalancerIP: string`
This setting specifies the IP used for the LoadBalancer to expose the ArangoSync SyncMasters on.
This setting is used when `spec.sync.externalAccess.type` is set to `LoadBalancer` or `Auto`.
If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner.
### `spec.sync.externalAccess.nodePort: int`
This setting specifies the port used to expose the ArangoSync SyncMasters on.
This setting is used when `spec.sync.externalAccess.type` is set to `NodePort` or `Auto`.
If you do not specify this setting, a random port will be chosen automatically.
### `spec.sync.externalAccess.loadBalancerSourceRanges: []string`
If specified and supported by the platform (cloud provider), this will restrict traffic through the cloud-provider
load-balancer will be restricted to the specified client IPs. This field will be ignored if the
cloud-provider does not support the feature.
More info: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/
### `spec.sync.externalAccess.masterEndpoint: []string`
This setting specifies the master endpoint(s) advertised by the ArangoSync SyncMasters.
If not set, this setting defaults to:
- If `spec.sync.externalAccess.loadBalancerIP` is set, it defaults to `https://<load-balancer-ip>:<8629>`.
- Otherwise it defaults to `https://<sync-service-dns-name>:<8629>`.
### `spec.sync.externalAccess.accessPackageSecretNames: []string`
This setting specifies the names of zero of more `Secrets` that will be created by the deployment
operator containing "access packages". An access package contains those `Secrets` that are needed
to access the SyncMasters of this `ArangoDeployment`.
By removing a name from this setting, the corresponding `Secret` is also deleted.
Note that to remove all access packages, leave an empty array in place (`[]`).
Completely removing the setting results in not modifying the list.
See [the `ArangoDeploymentReplication` specification](deployment-replication-resource-reference.md) for more information
on access packages.
### `spec.sync.auth.jwtSecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
the JWT token used for accessing all ArangoSync master servers.
When not specified, the `spec.auth.jwtSecretName` value is used.
If you specify a name of a `Secret` that does not exist, a random token is created
and stored in a `Secret` with given name.
### `spec.sync.auth.clientCASecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
a PEM encoded CA certificate used for client certificate verification
in all ArangoSync master servers.
This is a required setting when `spec.sync.enabled` is `true`.
The default value is empty.
### `spec.sync.mq.type: string`
This setting sets the type of message queue used by ArangoSync.
Possible values are:
- `Direct` (default) for direct HTTP connections between the 2 data centers.
### `spec.sync.tls.caSecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
a standard CA certificate + private key used to sign certificates for individual
ArangoSync master servers.
When no name is specified, it defaults to `<deployment-name>-sync-ca`.
If you specify a name of a `Secret` that does not exist, a self-signed CA certificate + key is created
and stored in a `Secret` with given name.
The specified `Secret`, must contain the following data fields:
- `ca.crt` PEM encoded public key of the CA certificate
- `ca.key` PEM encoded private key of the CA certificate
### `spec.sync.tls.altNames: []string`
This setting specifies a list of alternate names that will be added to all generated
certificates. These names can be DNS names or email addresses.
The default value is empty.
### `spec.sync.monitoring.tokenSecretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
the bearer token used for accessing all monitoring endpoints of all ArangoSync
servers.
When not specified, no monitoring token is used.
The default value is empty.
### `spec.disableIPv6: bool`
This setting prevents the use of IPv6 addresses by ArangoDB servers.
The default is `false`.
This setting cannot be changed after the deployment has been created.
### `spec.restoreFrom: string`
This setting specifies a `ArangoBackup` resource name the cluster should be restored from.
After a restore or failure to do so, the status of the deployment contains information about the
restore operation in the `restore` key.
It will contain some of the following fields:
- _requestedFrom_: name of the `ArangoBackup` used to restore from.
- _message_: optional message explaining why the restore failed.
- _state_: state indicating if the restore was successful or not. Possible values: `Restoring`, `Restored`, `RestoreFailed`
If the `restoreFrom` key is removed from the spec, the `restore` key is deleted as well.
A new restore attempt is made if and only if either in the status restore is not set or if spec.restoreFrom and status.requestedFrom are different.
### `spec.license.secretName: string`
This setting specifies the name of a kubernetes `Secret` that contains
the license key token used for enterprise images. This value is not used for
the Community Edition.
### `spec.bootstrap.passwordSecretNames.root: string`
This setting specifies a secret name for the credentials of the root user.
When a deployment is created the operator will setup the root user account
according to the credentials given by the secret. If the secret doesn't exist
the operator creates a secret with a random password.
There are two magic values for the secret name:
- `None` specifies no action. This disables root password randomization. This is the default value. (Thus the root password is empty - not recommended)
- `Auto` specifies automatic name generation, which is `<deploymentname>-root-password`.
### `spec.metrics.enabled: bool`
If this is set to `true`, the operator runs a sidecar container for
every Agent, DB-Server, Coordinator and Single server.
In addition to the sidecar containers the operator will deploy a service
to access the exporter ports (from within the k8s cluster), and a
resource of type `ServiceMonitor`, provided the corresponding custom
resource definition is deployed in the k8s cluster. If you are running
Prometheus in the same k8s cluster with the Prometheus operator, this
will be the case. The `ServiceMonitor` will have the following labels
set:
- `app: arangodb`
- `arango_deployment: YOUR_DEPLOYMENT_NAME`
- `context: metrics`
- `metrics: prometheus`
This makes it possible that you configure your Prometheus deployment to
automatically start monitoring on the available Prometheus feeds. To
this end, you must configure the `serviceMonitorSelector` in the specs
of your Prometheus deployment to match these labels. For example:
```yaml
serviceMonitorSelector:
matchLabels:
metrics: prometheus
```
would automatically select all pods of all ArangoDB cluster deployments
which have metrics enabled.
### `spec.metrics.image: string`
<small>Deprecated in: v1.2.0 (kube-arangodb)</small>
See above, this is the name of the Docker image for the ArangoDB
exporter to expose metrics. If empty, the same image as for the main
deployment is used.
### `spec.metrics.resources: ResourceRequirements`
<small>Introduced in: v0.4.3 (kube-arangodb)</small>
This setting specifies the resources required by the metrics container.
This includes requests and limits.
See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).
### `spec.metrics.mode: string`
<small>Introduced in: v1.0.2 (kube-arangodb)</small>
Defines metrics exporter mode.
Possible values:
- `exporter` (default): add sidecar to pods (except Agency pods) and exposes
metrics collected by exporter from ArangoDB Container. Exporter in this mode
exposes metrics which are accessible without authentication.
- `sidecar`: add sidecar to all pods and expose metrics from ArangoDB metrics
endpoint. Exporter in this mode exposes metrics which are accessible without
authentication.
- `internal`: configure ServiceMonitor to use internal ArangoDB metrics endpoint
(proper JWT token is generated for this endpoint).
### `spec.metrics.tls: bool`
<small>Introduced in: v1.1.0 (kube-arangodb)</small>
Defines if TLS should be enabled on Metrics exporter endpoint.
The default is `true`.
This option will enable TLS only if TLS is enabled on ArangoDeployment,
otherwise `true` value will not take any effect.
### `spec.lifecycle.resources: ResourceRequirements`
<small>Introduced in: v0.4.3 (kube-arangodb)</small>
This setting specifies the resources required by the lifecycle init container.
This includes requests and limits.
See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).
### `spec.<group>.count: number`
This setting specifies the number of servers to start for the given group.
For the Agent group, this value must be a positive, odd number.
The default value is `3` for all groups except `single` (there the default is `1`
for `spec.mode: Single` and `2` for `spec.mode: ActiveFailover`).
For the `syncworkers` group, it is highly recommended to use the same number
as for the `dbservers` group.
### `spec.<group>.minCount: number`
Specifies a minimum for the count of servers. If set, a specification is invalid if `count < minCount`.
### `spec.<group>.maxCount: number`
Specifies a maximum for the count of servers. If set, a specification is invalid if `count > maxCount`.
### `spec.<group>.args: []string`
This setting specifies additional command-line arguments passed to all servers of this group.
The default value is an empty array.
### `spec.<group>.resources: ResourceRequirements`
This setting specifies the resources required by pods of this group. This includes requests and limits.
See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ for details.
### `spec.<group>.overrideDetectedTotalMemory: bool`
<small>Introduced in: v1.0.1 (kube-arangodb)</small>
Set additional flag in ArangoDeployment pods to propagate Memory resource limits
### `spec.<group>.volumeClaimTemplate.Spec: PersistentVolumeClaimSpec`
Specifies a volumeClaimTemplate used by operator to create to volume claims for pods of this group.
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`.
The default value describes a volume with `8Gi` storage, `ReadWriteOnce` access mode and volume mode set to `PersistentVolumeFilesystem`.
If this field is not set and `spec.<group>.resources.requests.storage` is set, then a default volume claim
with size as specified by `spec.<group>.resources.requests.storage` will be created. In that case `storage`
and `iops` is not forwarded to the pods resource requirements.
### `spec.<group>.pvcResizeMode: string`
Specifies a resize mode used by operator to resize PVCs and PVs.
Supported modes:
- runtime (default) - PVC will be resized in Pod runtime (EKS, GKE)
- rotate - Pod will be shutdown and PVC will be resized (AKS)
### `spec.<group>.serviceAccountName: string`
This setting specifies the `serviceAccountName` for the `Pods` created
for each server of this group. If empty, it defaults to using the
`default` service account.
Using an alternative `ServiceAccount` is typically used to separate access rights.
The ArangoDB deployments need some very minimal access rights. With the
deployment of the operator, we grant the following rights for the `default`
service account:
```yaml
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
```
If you are using a different service account, please grant these rights
to that service account.
### `spec.<group>.annotations: map[string]string`
This setting set annotations overrides for pods in this group. Annotations are merged with `spec.annotations`.
### `spec.<group>.priorityClassName: string`
Priority class name for pods of this group. Will be forwarded to the pod spec. [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/)
### `spec.<group>.probes.livenessProbeDisabled: bool`
If set to true, the operator does not generate a liveness probe for new pods belonging to this group.
### `spec.<group>.probes.livenessProbeSpec.initialDelaySeconds: int`
Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 2 seconds. Minimum value is 0.
### `spec.<group>.probes.livenessProbeSpec.periodSeconds: int`
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
### `spec.<group>.probes.livenessProbeSpec.timeoutSeconds: int`
Number of seconds after which the probe times out. Defaults to 2 second. Minimum value is 1.
### `spec.<group>.probes.livenessProbeSpec.failureThreshold: int`
When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up.
Giving up means restarting the container. Defaults to 3. Minimum value is 1.
### `spec.<group>.probes.readinessProbeDisabled: bool`
If set to true, the operator does not generate a readiness probe for new pods belonging to this group.
### `spec.<group>.probes.readinessProbeSpec.initialDelaySeconds: int`
Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 2 seconds. Minimum value is 0.
### `spec.<group>.probes.readinessProbeSpec.periodSeconds: int`
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
### `spec.<group>.probes.readinessProbeSpec.timeoutSeconds: int`
Number of seconds after which the probe times out. Defaults to 2 second. Minimum value is 1.
### `spec.<group>.probes.readinessProbeSpec.successThreshold: int`
Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
### `spec.<group>.probes.readinessProbeSpec.failureThreshold: int`
When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up.
Giving up means the Pod will be marked Unready. Defaults to 3. Minimum value is 1.
### `spec.<group>.allowMemberRecreation: bool`
<small>Introduced in: v1.2.1 (kube-arangodb)</small>
This setting changes the member recreation logic based on group:
- For Sync Masters, Sync Workers, Coordinator and DB-Servers it determines if a member can be recreated in case of failure (default `true`)
- For Agents and Single this value is hardcoded to `false` and the value provided in spec is ignored.
### `spec.<group>.tolerations: []Toleration`
This setting specifies the `tolerations` for the `Pod`s created
for each server of this group.
By default, suitable tolerations are set for the following keys with the `NoExecute` effect:
- `node.kubernetes.io/not-ready`
- `node.kubernetes.io/unreachable`
- `node.alpha.kubernetes.io/unreachable` (will be removed in future version)
For more information on tolerations, consult the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/).
### `spec.<group>.nodeSelector: map[string]string`
This setting specifies a set of labels to be used as `nodeSelector` for Pods of this node.
For more information on node selectors, consult the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/).
### `spec.<group>.entrypoint: string`
Entrypoint overrides container executable.
### `spec.<group>.antiAffinity: PodAntiAffinity`
Specifies additional `antiAffinity` settings in ArangoDB Pod definitions.
For more information on `antiAffinity`, consult the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).
### `spec.<group>.affinity: PodAffinity`
Specifies additional `affinity` settings in ArangoDB Pod definitions.
For more information on `affinity`, consult the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).
### `spec.<group>.nodeAffinity: NodeAffinity`
Specifies additional `nodeAffinity` settings in ArangoDB Pod definitions.
For more information on `nodeAffinity`, consult the
[Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).
### `spec.<group>.securityContext: ServerGroupSpecSecurityContext`
Specifies additional `securityContext` settings in ArangoDB Pod definitions.
This is similar (but not fully compatible) to k8s SecurityContext definition.
For more information on `securityContext`, consult the
[Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).
### `spec.<group>.securityContext.addCapabilities: []Capability`
Adds new capabilities to containers.
### `spec.<group>.securityContext.allowPrivilegeEscalation: bool`
Controls whether a process can gain more privileges than its parent process.
### `spec.<group>.securityContext.privileged: bool`
Runs container in privileged mode. Processes in privileged containers are
essentially equivalent to root on the host.
### `spec.<group>.securityContext.readOnlyRootFilesystem: bool`
Mounts the container's root filesystem as read-only.
### `spec.<group>.securityContext.runAsNonRoot: bool`
Indicates that the container must run as a non-root user.
### `spec.<group>.securityContext.runAsUser: integer`
The UID to run the entrypoint of the container process.
### `spec.<group>.securityContext.runAsGroup: integer`
The GID to run the entrypoint of the container process.
### `spec.<group>.securityContext.supplementalGroups: []integer`
A list of groups applied to the first process run in each container, in addition to the container's primary GID,
the fsGroup (if specified), and group memberships defined in the container image for the uid of the container process.
### `spec.<group>.securityContext.fsGroup: integer`
A special supplemental group that applies to all containers in a pod.
### `spec.<group>.securityContext.seccompProfile: SeccompProfile`
The seccomp options to use by the containers in this pod.
### `spec.<group>.securityContext.seLinuxOptions: SELinuxOptions`
The SELinux context to be applied to all containers.
## Image discovery group `spec.id` fields
Image discovery (`id`) group supports only next subset of fields.
Refer to according field documentation in `spec.<group>` description.
- `spec.id.entrypoint: string`
- `spec.id.tolerations: []Toleration`
- `spec.id.nodeSelector: map[string]string`
- `spec.id.priorityClassName: string`
- `spec.id.antiAffinity: PodAntiAffinity`
- `spec.id.affinity: PodAffinity`
- `spec.id.nodeAffinity: NodeAffinity`
- `spec.id.serviceAccountName: string`
- `spec.id.securityContext: ServerGroupSpecSecurityContext`
- `spec.id.resources: ResourceRequirements`
## Deprecated Fields
### `spec.<group>.resources.requests.storage: storageUnit`
This setting specifies the amount of storage required for each server of this group.
The default value is `8Gi`.
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`
because servers in these groups do not need persistent storage.
Please use VolumeClaimTemplate from now on. This field is not considered if
VolumeClaimTemplate is set. Note however, that the information in requests
is completely handed over to the pod in this case.
### `spec.<group>.storageClassName: string`
This setting specifies the `storageClass` for the `PersistentVolume`s created
for each server of this group.
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`
because servers in these groups do not need persistent storage.
Please use VolumeClaimTemplate from now on. This field is not considered if
VolumeClaimTemplate is set. Note however, that the information in requests
is completely handed over to the pod in this case.

View file

@ -1,4 +1,4 @@
# ArangoDB operator architecture details
# ArangoDB operator architecture overview
- [Operator API](./api.md)
- [Backups](./backup.md)
@ -9,5 +9,4 @@
- [Pod eviction and replacement](./pod_eviction_and_replacement.md)
- [Kubernetes Pod name versus cluster ID](./pod_name_versus_cluster_id.md)
- [Resources & labels](./resources_and_labels.md)
- [Scaling](./scaling.md)
- [Topology awareness](./topology_awareness.md)

View file

@ -1,21 +0,0 @@
# Scaling
Number of running servers is controlled through `spec.<server_group>.count` field.
### Scale-up
When increasing the `count`, operator will try to create missing pods.
When scaling up make sure that you have enough computational resources / nodes, otherwise pod will stuck in Pending state.
### Scale-down
Scaling down is always done 1 server at a time.
Scale down is possible only when all other actions on ArangoDeployment are finished.
The internal process followed by the ArangoDB operator when scaling up is as follows:
- It chooses a member to be evicted. First it will try to remove unhealthy members or fall-back to the member with highest deletion_priority.
- Making an internal calls, it forces the server to resign leadership.
In case of DB servers it means that all shard leaders will be switched to other servers.
- Wait until server is cleaned out from cluster
- Pod finalized

453
docs/draining-nodes.md Normal file
View file

@ -0,0 +1,453 @@
# Draining Kubernetes nodes
**If Kubernetes nodes with ArangoDB pods on them are drained without care
data loss can occur!**
The recommended procedure is described below.
For maintenance work in k8s it is sometimes necessary to drain a k8s node,
which means removing all pods from it. Kubernetes offers a standard API
for this and our operator supports this - to the best of its ability.
Draining nodes is easy enough for stateless services, which can simply be
re-launched on any other node. However, for a stateful service this
operation is more difficult, and as a consequence more costly and there
are certain risks involved, if the operation is not done carefully
enough. To put it simply, the operator must first move all the data
stored on the node (which could be in a locally attached disk) to
another machine, before it can shut down the pod gracefully. Moving data
takes time, and even after the move, the distributed system ArangoDB has
to recover from this change, for example by ensuring data synchronicity
between the replicas in their new location.
Therefore, a systematic drain of all k8s nodes in sequence has to follow
a careful procedure, in particular to ensure that ArangoDB is ready to
move to the next step. This is necessary to avoid catastrophic data
loss, and is simply the price one pays for running a stateful service.
## Anatomy of a drain procedure in k8s: the grace period
When a `kubectl drain` operation is triggered for a node, k8s first
checks if there are any pods with local data on disk. Our ArangoDB pods have
this property (the _Coordinators_ do use `EmptyDir` volumes, and _Agents_
and _DB-Servers_ could have persistent volumes which are actually stored on
a locally attached disk), so one has to override this with the
`--delete-local-data=true` option.
Furthermore, quite often, the node will contain pods which are managed
by a `DaemonSet` (which is not the case for ArangoDB), which makes it
necessary to override this check with the `--ignore-daemonsets=true`
option.
Finally, it is checked if the node has any pods which are not managed by
anything, either by k8s itself (`ReplicationController`, `ReplicaSet`,
`Job`, `DaemonSet` or `StatefulSet`) or by an operator. If this is the
case, the drain operation will be refused, unless one uses the option
`--force=true`. Since the ArangoDB operator manages our pods, we do not
have to use this option for ArangoDB, but you might have to use it for
other pods.
If all these checks have been overcome, k8s proceeds as follows: All
pods are notified about this event and are put into a `Terminating`
state. During this time, they have a chance to take action, or indeed
the operator managing them has. In particular, although the pods get
termination notices, they can keep running until the operator has
removed all _finalizers_. This gives the operator a chance to sort out
things, for example in our case to move data away from the pod.
However, there is a limit to this tolerance by k8s, and that is the
grace period. If the grace period has passed but the pod has not
actually terminated, then it is killed the hard way. If this happens,
the operator has no chance but to remove the pod, drop its persistent
volume claim and persistent volume. This will obviously lead to a
failure incident in ArangoDB and must be handled by fail-over management.
Therefore, **this event should be avoided**.
## Things to check in ArangoDB before a node drain
There are basically two things one should check in an ArangoDB cluster
before a node drain operation can be started:
1. All cluster nodes are up and running and healthy.
2. For all collections and shards all configured replicas are in sync.
#### Attention:
1) If any cluster node is unhealthy, there is an increased risk that the
system does not have enough resources to cope with a failure situation.
2) If any shard replicas are not currently in sync, then there is a serious
risk that the cluster is currently not as resilient as expected.
One possibility to verify these two things is via the ArangoDB web interface.
Node health can be monitored in the _Overview_ tab under _NODES_:
![Cluster Health Screen](images/HealthyCluster.png)
**Check that all nodes are green** and that there is **no node error** in the
top right corner.
As to the shards being in sync, see the _Shards_ tab under _NODES_:
![Shard Screen](images/ShardsInSync.png)
**Check that all collections have a green check mark** on the right side.
If any collection does not have such a check mark, you can click on the
collection and see the details about shards. Please keep in
mind that this has to be done **for each database** separately!
Obviously, this might be tedious and calls for automation. Therefore, there
are APIs for this. The first one is [Cluster Health](https://docs.arangodb.com/stable/develop/http/cluster/#get-the-cluster-health):
```
POST /_admin/cluster/health
```
… which returns a JSON document looking like this:
```json
{
"Health": {
"CRDN-rxtu5pku": {
"Endpoint": "ssl://my-arangodb-cluster-coordinator-rxtu5pku.my-arangodb-cluster-int.default.svc:8529",
"LastAckedTime": "2019-02-20T08:09:22Z",
"SyncTime": "2019-02-20T08:09:21Z",
"Version": "3.4.2-1",
"Engine": "rocksdb",
"ShortName": "Coordinator0002",
"Timestamp": "2019-02-20T08:09:22Z",
"Status": "GOOD",
"SyncStatus": "SERVING",
"Host": "my-arangodb-cluster-coordinator-rxtu5pku.my-arangodb-cluster-int.default.svc",
"Role": "Coordinator",
"CanBeDeleted": false
},
"PRMR-wbsq47rz": {
"LastAckedTime": "2019-02-21T09:14:24Z",
"Endpoint": "ssl://my-arangodb-cluster-dbserver-wbsq47rz.my-arangodb-cluster-int.default.svc:8529",
"SyncTime": "2019-02-21T09:14:24Z",
"Version": "3.4.2-1",
"Host": "my-arangodb-cluster-dbserver-wbsq47rz.my-arangodb-cluster-int.default.svc",
"Timestamp": "2019-02-21T09:14:24Z",
"Status": "GOOD",
"SyncStatus": "SERVING",
"Engine": "rocksdb",
"ShortName": "DBServer0006",
"Role": "DBServer",
"CanBeDeleted": false
},
"AGNT-wrqmwpuw": {
"Endpoint": "ssl://my-arangodb-cluster-agent-wrqmwpuw.my-arangodb-cluster-int.default.svc:8529",
"Role": "Agent",
"CanBeDeleted": false,
"Version": "3.4.2-1",
"Engine": "rocksdb",
"Leader": "AGNT-oqohp3od",
"Status": "GOOD",
"LastAckedTime": 0.312
},
... [some more entries, one for each instance]
},
"ClusterId": "210a0536-fd28-46de-b77f-e8882d6d7078",
"error": false,
"code": 200
}
```
Check that each instance has a `Status` field with the value `"GOOD"`.
Here is a shell command which makes this check easy, using the
[`jq` JSON pretty printer](https://stedolan.github.io/jq/):
```bash
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/health --user root: | jq . | grep '"Status"' | grep -v '"GOOD"'
```
For the shards being in sync there is the
[Cluster Inventory](https://docs.arangodb.com/stable/develop/http/replication/replication-dump#get-the-cluster-collections-and-indexes)
API call:
```
POST /_db/_system/_api/replication/clusterInventory
```
… which returns a JSON body like this:
```json
{
"collections": [
{
"parameters": {
"cacheEnabled": false,
"deleted": false,
"globallyUniqueId": "c2010061/",
"id": "2010061",
"isSmart": false,
"isSystem": false,
"keyOptions": {
"allowUserKeys": true,
"type": "traditional"
},
"name": "c",
"numberOfShards": 6,
"planId": "2010061",
"replicationFactor": 2,
"shardKeys": [
"_key"
],
"shardingStrategy": "hash",
"shards": {
"s2010066": [
"PRMR-vzeebvwf",
"PRMR-e6hbjob1"
],
"s2010062": [
"PRMR-e6hbjob1",
"PRMR-vzeebvwf"
],
"s2010065": [
"PRMR-e6hbjob1",
"PRMR-vzeebvwf"
],
"s2010067": [
"PRMR-vzeebvwf",
"PRMR-e6hbjob1"
],
"s2010064": [
"PRMR-vzeebvwf",
"PRMR-e6hbjob1"
],
"s2010063": [
"PRMR-e6hbjob1",
"PRMR-vzeebvwf"
]
},
"status": 3,
"type": 2,
"waitForSync": false
},
"indexes": [],
"planVersion": 132,
"isReady": true,
"allInSync": true
},
... [more collections following]
],
"views": [],
"tick": "38139421",
"state": "unused"
}
```
Check that for all collections the attributes `"isReady"` and `"allInSync"`
both have the value `true`. Note that it is necessary to do this for all
databases!
Here is a shell command which makes this check easy:
```bash
curl -k https://arangodb.9hoeffer.de:8529/_db/_system/_api/replication/clusterInventory --user root: | jq . | grep '"isReady"\|"allInSync"' | sort | uniq -c
```
If all these checks are performed and are okay, then it is safe to
continue with the clean out and drain procedure as described below.
#### Attention:
If there are some collections with `replicationFactor` set to
1, the system is not resilient and cannot tolerate the failure of even a
single server! One can still perform a drain operation in this case, but
if anything goes wrong, in particular if the grace period is chosen too
short and a pod is killed the hard way, data loss can happen.
If all `replicationFactor`s of all collections are at least 2, then the
system can tolerate the failure of a single _DB-Server_. If you have set
the `Environment` to `Production` in the specs of the ArangoDB
deployment, you will only ever have one _DB-Server_ on each k8s node and
therefore the drain operation is relatively safe, even if the grace
period is chosen too small.
Furthermore, we recommend to have one k8s node more than _DB-Servers_ in
you cluster, such that the deployment of a replacement _DB-Server_ can
happen quickly and not only after the maintenance work on the drained
node has been completed. However, with the necessary care described
below, the procedure should also work without this.
Finally, one should **not run a rolling upgrade or restart operation**
at the time of a node drain.
## Clean out a DB-Server manually
In this step we clean out a _DB-Server_ manually, **before issuing the
`kubectl drain` command**. Previously, we have denoted this step as optional,
but for safety reasons, we consider it mandatory now, since it is near
impossible to choose the grace period long enough in a reliable way.
Furthermore, if this step is not performed, we must choose
the grace period long enough to avoid any risk, as explained in the
previous section. However, this has a disadvantage which has nothing to
do with ArangoDB: We have observed, that some k8s internal services like
`fluentd` and some DNS services will always wait for the full grace
period to finish a node drain. Therefore, the node drain operation will
always take as long as the grace period. Since we have to choose this
grace period long enough for ArangoDB to move all data on the _DB-Server_
pod away to some other node, this can take a considerable amount of
time, depending on the size of the data you keep in ArangoDB.
Therefore it is more time-efficient to perform the clean-out operation
beforehand. One can observe completion and as soon as it is completed
successfully, we can then issue the drain command with a relatively
small grace period and still have a nearly risk-free procedure.
To clean out a _DB-Server_ manually, we have to use this API:
```
POST /_admin/cluster/cleanOutServer
```
… and send as body a JSON document like this:
```json
{"server":"DBServer0006"}
```
The value of the `"server"` attribute should be the name of the DB-Server
which is the one in the pod which resides on the node that shall be
drained next. This uses the UI short name (`ShortName` in the
`/_admin/cluster/health` API), alternatively one can use the
internal name, which corresponds to the pod name. In our example, the
pod name is:
```
my-arangodb-cluster-prmr-wbsq47rz-5676ed
```
… where `my-arangodb-cluster` is the ArangoDB deployment name, therefore
the internal name of the _DB-Server_ is `PRMR-wbsq47rz`. Note that `PRMR`
must be all capitals since pod names are always all lower case. So, we
could use the body:
```json
{"server":"PRMR-wbsq47rz"}
```
You can use this command line to achieve this:
```bash
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/cleanOutServer --user root: -d '{"server":"PRMR-wbsq47rz"}'
```
The API call will return immediately with a body like this:
```json
{"error":false,"id":"38029195","code":202}
```
The given `id` in this response can be used to query the outcome or
completion status of the clean out server job with this API:
```
GET /_admin/cluster/queryAgencyJob?id=38029195
```
… which will return a body like this:
```json
{
"error": false,
"id": "38029195",
"status": "Pending",
"job": {
"timeCreated": "2019-02-21T10:42:14.727Z",
"server": "PRMR-wbsq47rz",
"timeStarted": "2019-02-21T10:42:15Z",
"type": "cleanOutServer",
"creator": "CRDN-rxtu5pku",
"jobId": "38029195"
},
"code": 200
}
```
Use this command line to check progress:
```bash
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/queryAgencyJob?id=38029195 --user root:
```
It indicates that the job is still ongoing (`"Pending"`). As soon as
the job has completed, the answer will be:
```json
{
"error": false,
"id": "38029195",
"status": "Finished",
"job": {
"timeCreated": "2019-02-21T10:42:14.727Z",
"server": "PRMR-e6hbjob1",
"jobId": "38029195",
"timeStarted": "2019-02-21T10:42:15Z",
"timeFinished": "2019-02-21T10:45:39Z",
"type": "cleanOutServer",
"creator": "CRDN-rxtu5pku"
},
"code": 200
}
```
From this moment on the _DB-Server_ can no longer be used to move
shards to. At the same time, it will no longer hold any data of the
cluster.
Now the drain operation involving a node with this pod on it is
completely risk-free, even with a small grace period.
## Performing the drain
After all above [checks before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
and the [manual clean out of the DB-Server](#clean-out-a-db-server-manually)
have been done successfully, it is safe to perform the drain operation, similar to this command:
```bash
kubectl drain gke-draintest-default-pool-394fe601-glts --delete-local-data --ignore-daemonsets --grace-period=300
```
As described above, the options `--delete-local-data` for ArangoDB and
`--ignore-daemonsets` for other services have been added. A `--grace-period` of
300 seconds has been chosen because for this example we are confident that all the data on our _DB-Server_ pod
can be moved to a different server within 5 minutes. Note that this is
**not saying** that 300 seconds will always be enough. Regardless of how
much data is stored in the pod, your mileage may vary, moving a terabyte
of data can take considerably longer!
If the highly recommended step of
[cleaning out a DB-Server manually](#clean-out-a-db-server-manually)
has been performed beforehand, the grace period can easily be reduced to 60
seconds - at least from the perspective of ArangoDB, since the server is already
cleaned out, so it can be dropped readily and there is still no risk.
At the same time, this guarantees now that the drain is completed
approximately within a minute.
## Things to check after a node drain
After a node has been drained, there will usually be one of the
_DB-Servers_ gone from the cluster. As a replacement, another _DB-Server_ has
been deployed on a different node, if there is a different node
available. If not, the replacement can only be deployed when the
maintenance work on the drained node has been completed and it is
uncordoned again. In this latter case, one should wait until the node is
back up and the replacement pod has been deployed there.
After that, one should perform the same checks as described in
[things to check before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
above.
Finally, it is likely that the shard distribution in the "new" cluster
is not balanced out. In particular, the new _DB-Server_ is not automatically
used to store shards. We recommend to
[re-balance](https://docs.arangodb.com/stable/deploy/deployment/cluster/administration/#movingrebalancing-_shards_) the shard distribution,
either manually by moving shards or by using the _Rebalance Shards_
button in the _Shards_ tab under _NODES_ in the web interface. This redistribution can take
some time again and progress can be monitored in the UI.
After all this has been done, **another round of checks should be done**
before proceeding to drain the next node.

View file

@ -0,0 +1,132 @@
# Configuring your driver for ArangoDB access
In this chapter you'll learn how to configure a driver for accessing
an ArangoDB deployment in Kubernetes.
The exact methods to configure a driver are specific to that driver.
## Database endpoint(s)
The endpoint(s) (or URLs) to communicate with is the most important
parameter your need to configure in your driver.
Finding the right endpoints depend on whether your client application is running in
the same Kubernetes cluster as the ArangoDB deployment or not.
### Client application in same Kubernetes cluster
If your client application is running in the same Kubernetes cluster as
the ArangoDB deployment, you should configure your driver to use the
following endpoint:
```
https://<deployment-name>.<namespace>.svc:8529
```
Only if your deployment has set `spec.tls.caSecretName` to `None`, should
you use `http` instead of `https`.
### Client application outside Kubernetes cluster
If your client application is running outside the Kubernetes cluster in which
the ArangoDB deployment is running, your driver endpoint depends on the
external-access configuration of your ArangoDB deployment.
If the external-access of the ArangoDB deployment is of type `LoadBalancer`,
then use the IP address of that `LoadBalancer` like this:
```
https://<load-balancer-ip>:8529
```
If the external-access of the ArangoDB deployment is of type `NodePort`,
then use the IP address(es) of the `Nodes` of the Kubernetes cluster,
combined with the `NodePort` that is used by the external-access service.
For example:
```
https://<kubernetes-node-1-ip>:30123
```
You can find the type of external-access by inspecting the external-access `Service`.
To do so, run the following command:
```bash
kubectl get service -n <namespace-of-deployment> <deployment-name>-ea
```
The output looks like this:
```bash
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
example-simple-cluster-ea LoadBalancer 10.106.175.38 192.168.10.208 8529:31890/TCP 1s app=arangodb,arango_deployment=example-simple-cluster,role=coordinator
```
In this case the external-access is of type `LoadBalancer` with a load-balancer IP address
of `192.168.10.208`.
This results in an endpoint of `https://192.168.10.208:8529`.
## TLS settings
As mentioned before the ArangoDB deployment managed by the ArangoDB operator
will use a secure (TLS) connection unless you set `spec.tls.caSecretName` to `None`
in your `ArangoDeployment`.
When using a secure connection, you can choose to verify the server certificates
provides by the ArangoDB servers or not.
If you want to verify these certificates, configure your driver with the CA certificate
found in a Kubernetes `Secret` found in the same namespace as the `ArangoDeployment`.
The name of this `Secret` is stored in the `spec.tls.caSecretName` setting of
the `ArangoDeployment`. If you don't set this setting explicitly, it will be
set automatically.
Then fetch the CA secret using the following command (or use a Kubernetes client library to fetch it):
```bash
kubectl get secret -n <namespace> <secret-name> --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt
```
This results in a file called `ca.crt` containing a PEM encoded, x509 CA certificate.
## Query requests
For most client requests made by a driver, it does not matter if there is any
kind of load-balancer between your client application and the ArangoDB
deployment.
#### Note:
Even a simple `Service` of type `ClusterIP` already behaves as a load-balancer.
The exception to this is cursor-related requests made to an ArangoDB `Cluster`
deployment. The Coordinator that handles an initial query request (that results
in a `Cursor`) will save some in-memory state in that Coordinator, if the result
of the query is too big to be transfer back in the response of the initial
request.
Follow-up requests have to be made to fetch the remaining data. These follow-up
requests must be handled by the same Coordinator to which the initial request
was made. As soon as there is a load-balancer between your client application
and the ArangoDB cluster, it is uncertain which Coordinator will receive the
follow-up request.
ArangoDB will transparently forward any mismatched requests to the correct
Coordinator, so the requests can be answered correctly without any additional
configuration. However, this incurs a small latency penalty due to the extra
request across the internal network.
To prevent this uncertainty client-side, make sure to run your client
application in the same Kubernetes cluster and synchronize your endpoints before
making the initial query request. This will result in the use (by the driver) of
internal DNS names of all Coordinators. A follow-up request can then be sent to
exactly the same Coordinator.
If your client application is running outside the Kubernetes cluster the easiest
way to work around it is by making sure that the query results are small enough
to be returned by a single request. When that is not feasible, it is also
possible to resolve this when the internal DNS names of your Kubernetes cluster
are exposed to your client application and the resulting IP addresses are
routable from your client application. To expose internal DNS names of your
Kubernetes cluster, your can use [CoreDNS](https://coredns.io).

156
docs/helm.md Normal file
View file

@ -0,0 +1,156 @@
# Using the ArangoDB Kubernetes Operator with Helm
[`Helm`](https://www.helm.sh/) is a package manager for Kubernetes, which enables
you to install various packages (include the ArangoDB Kubernetes Operator)
into your Kubernetes cluster.
The benefit of `helm` (in the context of the ArangoDB Kubernetes Operator)
is that it allows for a lot of flexibility in how you install the operator.
For example you can install the operator in a namespace other than
`default`.
## Charts
The ArangoDB Kubernetes Operator is contained in `helm` chart `kube-arangodb` which contains the operator for the
`ArangoDeployment`, `ArangoLocalStorage` and `ArangoDeploymentReplication` resource types.
## Configurable values for ArangoDB Kubernetes Operator
The following values can be configured when installing the
ArangoDB Kubernetes Operator with `helm`.
Values are passed to `helm` using an `--set=<key>=<value>` argument passed
to the `helm install` or `helm upgrade` command.
### `operator.image`
Image used for the ArangoDB Operator.
Default: `arangodb/kube-arangodb:latest`
### `operator.imagePullPolicy`
Image pull policy for Operator images.
Default: `IfNotPresent`
### `operator.imagePullSecrets`
List of the Image Pull Secrets for Operator images.
Default: `[]string`
### `operator.service.type`
Type of the Operator service.
Default: `ClusterIP`
### `operator.annotations`
Annotations passed to the Operator Deployment definition.
Default: `[]string`
### `operator.resources.limits.cpu`
CPU limits for operator pods.
Default: `1`
### `operator.resources.limits.memory`
Memory limits for operator pods.
Default: `256Mi`
### `operator.resources.requested.cpu`
Requested CPI by Operator pods.
Default: `250m`
### `operator.resources.requested.memory`
Requested memory for operator pods.
Default: `256Mi`
### `operator.replicaCount`
Replication count for Operator deployment.
Default: `2`
### `operator.updateStrategy`
Update strategy for operator pod.
Default: `Recreate`
### `operator.features.deployment`
Define if ArangoDeployment Operator should be enabled.
Default: `true`
### `operator.features.deploymentReplications`
Define if ArangoDeploymentReplications Operator should be enabled.
Default: `true`
### `operator.features.storage`
Define if ArangoLocalStorage Operator should be enabled.
Default: `false`
### `operator.features.backup`
Define if ArangoBackup Operator should be enabled.
Default: `false`
### `operator.enableCRDManagement`
If true and operator has enough access permissions, it will try to install missing CRDs.
Default: `true`
### `rbac.enabled`
Define if RBAC should be enabled.
Default: `true`
## Alternate namespaces
The `kube-arangodb` chart supports deployment into a non-default namespace.
To install the `kube-arangodb` chart is a non-default namespace, use the `--namespace`
argument like this.
```bash
helm install --namespace=mynamespace kube-arangodb.tgz
```
Note that since the operators claim exclusive access to a namespace, you can
install the `kube-arangodb` chart in a namespace once.
You can install the `kube-arangodb` chart in multiple namespaces. To do so, run:
```bash
helm install --namespace=namespace1 kube-arangodb.tgz
helm install --namespace=namespace2 kube-arangodb.tgz
```
The `kube-arangodb-storage` chart is always installed in the `kube-system` namespace.
## Common problems
### Error: no available release name found
This error is given by `helm install ...` in some cases where it has
insufficient permissions to install charts.
For various ways to work around this problem go to [this Stackoverflow article](https://stackoverflow.com/questions/43499971/helm-error-no-available-release-name-found).

View file

@ -1,12 +1,12 @@
## How-to...
- [How to set a license key](./set_license.md)
- [Pass additional params to operator](additional_configuration.md)
- [Change architecture / enable ARM support](arch_change.md)
- [Configure timezone for cluster](configuring_tz.md)
- [Collect debug data for support case](debugging.md)
- [Configure logging](logging.md)
- [Enable maintenance mode](maintenance.md)
- [Start metrics collection and monitoring](metrics.md)
- [Override detected total memory](override_detected_memory.md)
- [Manually recover cluster if you still have volumes with data](recovery.md)
- [How to rotate Pod](rotate-pod.md)

View file

@ -0,0 +1,17 @@
# How to set a license key
After deploying the ArangoDB Kubernetes operator, use the command below to deploy your [license key](https://docs.arangodb.com/stable/operations/administration/license-management/)
as a secret which is required for the Enterprise Edition starting with version 3.9:
```bash
kubectl create secret generic arango-license-key --from-literal=token-v2="<license-string>"
```
Then specify the newly created secret in the ArangoDeploymentSpec:
```yaml
spec:
# [...]
license:
secretName: arango-license-key
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

152
docs/metrics.md Normal file
View file

@ -0,0 +1,152 @@
# Metrics collection
Operator provides metrics of its operations in a format supported by [Prometheus](https://prometheus.io/).
The metrics are exposed through HTTPS on port `8528` under path `/metrics`.
For a full list of available metrics, see [here](generated/metrics/README.md).
Check out examples directory [examples/metrics](https://github.com/arangodb/kube-arangodb/tree/master/examples/metrics)
for `Services` and `ServiceMonitors` definitions you can use to integrate
with Prometheus through the [Prometheus-Operator by CoreOS](https://github.com/coreos/prometheus-operator).
#### Contents
- [Integration with standard Prometheus installation (no TLS)](#Integration-with-standard-Prometheus-installation-no-TLS)
- [Integration with standard Prometheus installation (TLS)](#Integration-with-standard-Prometheus-installation-TLS)
- [Integration with Prometheus Operator](#Integration-with-Prometheus-Operator)
- [Exposing ArangoDB metrics](#ArangoDB-metrics)
## Integration with standard Prometheus installation (no TLS)
After creating operator deployment, you must configure Prometheus using a configuration file that instructs it
about which targets to scrape.
To do so, add a new scrape job to your prometheus.yaml config:
```yaml
scrape_configs:
- job_name: 'arangodb-operator'
scrape_interval: 10s # scrape every 10 seconds.
scheme: 'https'
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
- "<operator-endpoint-ip>:8528"
```
## Integration with standard Prometheus installation (TLS)
By default, the operator uses self-signed certificate for its server API.
To use your own certificate, you need to create k8s secret containing certificate and provide secret name to operator.
Create k8s secret (in same namespace where the operator is running):
```shell
kubectl create secret tls my-own-certificate --cert ./cert.crt --key ./cert.key
```
Then edit the operator deployment definition (`kubectl edit deployments.apps`) to use your secret for its server API:
```
spec:
# ...
containers:
# ...
args:
- --server.tls-secret-name=my-own-certificate
# ...
```
Wait for operator pods to restart.
Now update Prometheus config to use your certificate for operator scrape job:
```yaml
tls_config:
# if you are using self-signed certificate, just specify CA certificate:
ca_file: /etc/prometheus/rootCA.crt
# otherwise, specify the generated client certificate and key:
cert_file: /etc/prometheus/cert.crt
key_file: /etc/prometheus/cert.key
```
## Integration with Prometheus Operator
Assuming that you have [Prometheus Operator](https://prometheus-operator.dev/) installed in your cluster (`monitoring` namespace),
and kube-arangodb installed in `default` namespace, you can easily configure the integration with ArangoDB operator.
The easiest way to do that is to create new a ServiceMonitor:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: arango-deployment-operator
namespace: monitoring
labels:
prometheus: kube-prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-arangodb
namespaceSelector:
matchNames:
- default
endpoints:
- port: server
scheme: https
tlsConfig:
insecureSkipVerify: true
```
You also can see the example of Grafana dashboard at `examples/metrics` folder of this repo.
## ArangoDB metrics
The operator can run [sidecar containers](./design/exporter.md) for ArangoDB deployments of type `Cluster` which expose metrics in Prometheus format.
Edit your `ArangoDeployment` resource, setting `spec.metrics.enabled` to true to enable ArangoDB metrics:
```yaml
spec:
metrics:
enabled: true
```
The operator will run a sidecar container for every cluster component.
In addition to the sidecar containers the operator will deploy a `Service` to access the exporter ports (from within the k8s cluster),
and a resource of type `ServiceMonitor`, provided the corresponding custom resource definition is deployed in the k8s cluster.
If you are running Prometheus in the same k8s cluster with the Prometheus operator, this will be the case.
The ServiceMonitor will have the following labels set:
```yaml
app: arangodb
arango_deployment: YOUR_DEPLOYMENT_NAME
context: metrics
metrics: prometheus
```
This makes it possible to configure your Prometheus deployment to automatically start monitoring on the available Prometheus feeds.
To this end, you must configure the `serviceMonitorSelector` in the specs of your Prometheus deployment to match these labels. For example:
```yaml
serviceMonitorSelector:
matchLabels:
metrics: prometheus
```
would automatically select all pods of all ArangoDB cluster deployments which have metrics enabled.
By default, the sidecar metrics exporters are using TLS for all connections. You can disable the TLS by specifying
```yaml
spec:
metrics:
enabled: true
tls: false
```
You can fine-tune the monitored metrics by specifying `ArangoDeployment` annotations. Example:
```yaml
spec:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9101'
prometheus.io/scrape_interval: '5s'
```
See the [Metrics HTTP API documentation](https://docs.arangodb.com/stable/develop/http/monitoring/#metrics)
for the metrics exposed by ArangoDB deployments.

42
docs/scaling.md Normal file
View file

@ -0,0 +1,42 @@
# Scaling your ArangoDB deployment
The ArangoDB Kubernetes Operator supports up and down scaling of
the number of DB-Servers & Coordinators.
The scale up or down, change the number of servers in the custom
resource.
E.g. change `spec.dbservers.count` from `3` to `4`.
Then apply the updated resource using:
```bash
kubectl apply -f yourCustomResourceFile.yaml
```
Inspect the status of the custom resource to monitor the progress of the scaling operation.
**Note: It is not possible to change the number of Agency servers after creating a cluster**.
Make sure to specify the desired number when creating CR first time.
## Overview
### Scale-up
When increasing the `count`, operator will try to create missing pods.
When scaling up, make sure that you have enough computational resources / nodes, otherwise pod will stuck in Pending state.
### Scale-down
Scaling down is always done 1 server at a time.
Scale down is possible only when all other actions on ArangoDeployment are finished.
The internal process followed by the ArangoDB operator when scaling up is as follows:
- It chooses a member to be evicted. First it will try to remove unhealthy members or fall-back to the member with highest deletion_priority.
- Making an internal calls, it forces the server to resign leadership.
In case of DB servers it means that all shard leaders will be switched to other servers.
- Wait until server is cleaned out from cluster.
- Pod finalized.

View file

@ -0,0 +1,125 @@
# Services & Load balancer
The ArangoDB Kubernetes Operator will create services that can be used to
reach the ArangoDB servers from inside the Kubernetes cluster.
By default, the ArangoDB Kubernetes Operator will also create an additional
service to reach the ArangoDB deployment from outside the Kubernetes cluster.
For exposing the ArangoDB deployment to the outside, there are 2 options:
- Using a `NodePort` service. This will expose the deployment on a specific port (above 30.000)
on all nodes of the Kubernetes cluster.
- Using a `LoadBalancer` service. This will expose the deployment on a load-balancer
that is provisioned by the Kubernetes cluster.
The `LoadBalancer` option is the most convenient, but not all Kubernetes clusters
are able to provision a load-balancer. Therefore we offer a third (and default) option: `Auto`.
In this option, the ArangoDB Kubernetes Operator tries to create a `LoadBalancer`
service. It then waits for up to a minute for the Kubernetes cluster to provision
a load-balancer for it. If that has not happened after a minute, the service
is replaced by a service of type `NodePort`.
To inspect the created service, run:
```bash
kubectl get services <deployment-name>-ea
```
To use the ArangoDB servers from outside the Kubernetes cluster
you have to add another service as explained below.
## Services
If you do not want the ArangoDB Kubernetes Operator to create an external-access
service for you, set `spec.externalAccess.Type` to `None`.
If you want to create external access services manually, follow the instructions below.
### Single server
For a single server deployment, the operator creates a single
`Service` named `<deployment-name>`. This service has a normal cluster IP
address.
### Full cluster
For a full cluster deployment, the operator creates two `Services`.
- `<deployment-name>-int` a headless `Service` intended to provide
DNS names for all pods created by the operator.
It selects all ArangoDB & ArangoSync servers in the cluster.
- `<deployment-name>` a normal `Service` that selects only the Coordinators
of the cluster. This `Service` is configured with `ClientIP` session
affinity. This is needed for cursor requests, since they are bound to
a specific Coordinator.
When the Coordinators are asked to provide endpoints of the cluster
(e.g. when calling `client.SynchronizeEndpoints()` in the go driver)
the DNS names of the individual `Pods` will be returned
(`<pod>.<deployment-name>-int.<namespace>.svc`)
### Full cluster with DC2DC
For a full cluster with datacenter replication deployment,
the same `Services` are created as for a Full cluster, with the following
additions:
- `<deployment-name>-sync` a normal `Service` that selects only the syncmasters
of the cluster.
## Load balancer
If you want full control of the `Services` needed to access the ArangoDB deployment
from outside your Kubernetes cluster, set `spec.externalAccess.type` of the `ArangoDeployment` to `None`
and create a `Service` as specified below.
Create a `Service` of type `LoadBalancer` or `NodePort`, depending on your
Kubernetes deployment.
This service should select:
- `arango_deployment: <deployment-name>`
- `role: coordinator`
The following example yields a service of type `LoadBalancer` with a specific
load balancer IP address.
With this service, the ArangoDB cluster can now be reached on `https://1.2.3.4:8529`.
```yaml
kind: Service
apiVersion: v1
metadata:
name: arangodb-cluster-exposed
spec:
selector:
arango_deployment: arangodb-cluster
role: coordinator
type: LoadBalancer
loadBalancerIP: 1.2.3.4
ports:
- protocol: TCP
port: 8529
targetPort: 8529
```
The following example yields a service of type `NodePort` with the ArangoDB
cluster exposed on port 30529 of all nodes of the Kubernetes cluster.
```yaml
kind: Service
apiVersion: v1
metadata:
name: arangodb-cluster-exposed
spec:
selector:
arango_deployment: arangodb-cluster
role: coordinator
type: NodePort
ports:
- protocol: TCP
port: 8529
targetPort: 8529
nodePort: 30529
```

63
docs/storage-resource.md Normal file
View file

@ -0,0 +1,63 @@
# ArangoLocalStorage Custom Resource
The ArangoDB Storage Operator creates and maintains ArangoDB
storage resources in a Kubernetes cluster, given a storage specification.
This storage specification is a `CustomResource` following
a `CustomResourceDefinition` created by the operator. It is not enabled by
default in the operator.
Example minimal storage definition:
```yaml
apiVersion: "storage.arangodb.com/v1alpha"
kind: "ArangoLocalStorage"
metadata:
name: "example-arangodb-storage"
spec:
storageClass:
name: my-local-ssd
localPath:
- /mnt/big-ssd-disk
```
This definition results in:
- a `StorageClass` called `my-local-ssd`
- the dynamic provisioning of PersistentVolume's with
a local volume on a node where the local volume starts
in a sub-directory of `/mnt/big-ssd-disk`.
- the dynamic cleanup of PersistentVolume's (created by
the operator) after one is released.
The provisioned volumes will have a capacity that matches
the requested capacity of volume claims.
## Specification reference
Below you'll find all settings of the `ArangoLocalStorage` custom resource.
### `spec.storageClass.name: string`
This setting specifies the name of the storage class that
created `PersistentVolume` will use.
If empty, this field defaults to the name of the `ArangoLocalStorage`
object.
If a `StorageClass` with given name does not yet exist, it
will be created.
### `spec.storageClass.isDefault: bool`
This setting specifies if the created `StorageClass` will
be marked as default storage class. (default is `false`)
### `spec.localPath: stringList`
This setting specifies one of more local directories
(on the nodes) used to create persistent volumes in.
### `spec.nodeSelector: nodeSelector`
This setting specifies which nodes the operator will
provision persistent volumes on.

144
docs/storage.md Normal file
View file

@ -0,0 +1,144 @@
# Storage configuration
An ArangoDB cluster relies heavily on fast persistent storage.
The ArangoDB Kubernetes Operator uses `PersistentVolumeClaims` to deliver
the storage to Pods that need them.
## Requirements
To use `ArangoLocalStorage` resources, it has to be enabled in the operator
(replace `<version>` with the
[version of the operator](https://github.com/arangodb/kube-arangodb/releases)):
```bash
helm upgrade --install kube-arangodb \
https://github.com/arangodb/kube-arangodb/releases/download/<version>/kube-arangodb-<version>.tgz \
--set operator.features.storage=true
```
## Storage configuration
In the `ArangoDeployment` resource, one can specify the type of storage
used by groups of servers using the `spec.<group>.volumeClaimTemplate`
setting.
This is an example of a `Cluster` deployment that stores its Agent & DB-Server
data on `PersistentVolumes` that use the `my-local-ssd` `StorageClass`
The amount of storage needed is configured using the
`spec.<group>.resources.requests.storage` setting.
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "cluster-using-local-ssh"
spec:
mode: Cluster
agents:
volumeClaimTemplate:
spec:
storageClassName: my-local-ssd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
volumeMode: Filesystem
dbservers:
volumeClaimTemplate:
spec:
storageClassName: my-local-ssd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 80Gi
volumeMode: Filesystem
```
Note that configuring storage is done per group of servers.
It is not possible to configure storage per individual
server.
This is an example of a `Cluster` deployment that requests volumes of 80GB
for every DB-Server, resulting in a total storage capacity of 240GB (with 3 DB-Servers).
## Local storage
For optimal performance, ArangoDB should be configured with locally attached
SSD storage.
The easiest way to accomplish this is to deploy an
[`ArangoLocalStorage` resource](storage-resource.md).
The ArangoDB Storage Operator will use it to provide `PersistentVolumes` for you.
This is an example of an `ArangoLocalStorage` resource that will result in
`PersistentVolumes` created on any node of the Kubernetes cluster
under the directory `/mnt/big-ssd-disk`.
```yaml
apiVersion: "storage.arangodb.com/v1alpha"
kind: "ArangoLocalStorage"
metadata:
name: "example-arangodb-storage"
spec:
storageClass:
name: my-local-ssd
localPath:
- /mnt/big-ssd-disk
```
Note that using local storage required `VolumeScheduling` to be enabled in your
Kubernetes cluster. ON Kubernetes 1.10 this is enabled by default, on version
1.9 you have to enable it with a `--feature-gate` setting.
### Manually creating `PersistentVolumes`
The alternative is to create `PersistentVolumes` manually, for all servers that
need persistent storage (single, Agents & DB-Servers).
E.g. for a `Cluster` with 3 Agents and 5 DB-Servers, you must create 8 volumes.
Note that each volume must have a capacity that is equal to or higher than the
capacity needed for each server.
To select the correct node, add a required node-affinity annotation as shown
in the example below.
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: volume-agent-1
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-ssd
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "node-1
```
For Kubernetes 1.9 and up, you should create a `StorageClass` which is configured
to bind volumes on their first use as shown in the example below.
This ensures that the Kubernetes scheduler takes all constraints on a `Pod`
that into consideration before binding the volume to a claim.
```yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
```

54
docs/tls.md Normal file
View file

@ -0,0 +1,54 @@
# Secure connections (TLS)
The ArangoDB Kubernetes Operator will by default create ArangoDB deployments
that use secure TLS connections.
It uses a single CA certificate (stored in a Kubernetes secret) and
one certificate per ArangoDB server (stored in a Kubernetes secret per server).
To disable TLS, set `spec.tls.caSecretName` to `None`.
## Install CA certificate
If the CA certificate is self-signed, it will not be trusted by browsers,
until you install it in the local operating system or browser.
This process differs per operating system.
To do so, you first have to fetch the CA certificate from its Kubernetes
secret.
```bash
kubectl get secret <deploy-name>-ca --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt
```
### Windows
To install a CA certificate in Windows, follow the
[procedure described here](http://wiki.cacert.org/HowTo/InstallCAcertRoots).
### macOS
To install a CA certificate in macOS, run:
```bash
sudo /usr/bin/security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ca.crt
```
To uninstall a CA certificate in macOS, run:
```bash
sudo /usr/bin/security remove-trusted-cert -d ca.crt
```
### Linux
To install a CA certificate in Linux, on Ubuntu, run:
```bash
sudo cp ca.crt /usr/local/share/ca-certificates/<some-name>.crt
sudo update-ca-certificates
```
## See also
- [Authentication](authentication.md)

115
docs/troubleshooting.md Normal file
View file

@ -0,0 +1,115 @@
# Troubleshooting
While Kubernetes and the ArangoDB Kubernetes operator automatically
resolve a lot of issues, there are always cases where human attention
is needed.
This chapter gives your tips & tricks to help you troubleshoot deployments.
## Where to look
In Kubernetes all resources can be inspected using `kubectl` using either
the `get` or `describe` command.
To get all details of the resource (both specification & status),
run the following command:
```bash
kubectl get <resource-type> <resource-name> -n <namespace> -o yaml
```
For example, to get the entire specification and status
of an `ArangoDeployment` resource named `my-arangodb` in the `default` namespace,
run:
```bash
kubectl get ArangoDeployment my-arango -n default -o yaml
# or shorter
kubectl get arango my-arango -o yaml
```
Several types of resources (including all ArangoDB custom resources) support
events. These events show what happened to the resource over time.
To show the events (and most important resource data) of a resource,
run the following command:
```bash
kubectl describe <resource-type> <resource-name> -n <namespace>
```
## Getting logs
Another invaluable source of information is the log of containers being run
in Kubernetes.
These logs are accessible through the `Pods` that group these containers.
To fetch the logs of the default container running in a `Pod`, run:
```bash
kubectl logs <pod-name> -n <namespace>
# or with follow option to keep inspecting logs while they are written
kubectl logs <pod-name> -n <namespace> -f
```
To inspect the logs of a specific container in `Pod`, add `-c <container-name>`.
You can find the names of the containers in the `Pod`, using `kubectl describe pod ...`.
## What if
### The `Pods` of a deployment stay in `Pending` state
There are two common causes for this.
- The `Pods` cannot be scheduled because there are not enough nodes available.
This is usually only the case with a `spec.environment` setting that has a value of `Production`.
Solution: Add more nodes.
- There are no `PersistentVolumes` available to be bound to the `PersistentVolumeClaims`
created by the operator.
Solution:
Use `kubectl get persistentvolumes` to inspect the available `PersistentVolumes`
and if needed, use the [`ArangoLocalStorage` operator](storage-resource.md)
to provision `PersistentVolumes`.
### When restarting a `Node`, the `Pods` scheduled on that node remain in `Terminating` state
When a `Node` no longer makes regular calls to the Kubernetes API server, it is
marked as not available. Depending on specific settings in your `Pods`, Kubernetes
will at some point decide to terminate the `Pod`. As long as the `Node` is not
completely removed from the Kubernetes API server, Kubernetes tries to use
the `Node` itself to terminate the `Pod`.
The `ArangoDeployment` operator recognizes this condition and tries to replace those
`Pods` with `Pods` on different nodes. The exact behavior differs per type of server.
### What happens when a `Node` with local data is broken
When a `Node` with `PersistentVolumes` hosted on that `Node` is broken and
cannot be repaired, the data in those `PersistentVolumes` is lost.
If an `ArangoDeployment` of type `Single` was using one of those `PersistentVolumes`
the database is lost and must be restored from a backup.
If an `ArangoDeployment` of type `ActiveFailover` or `Cluster` was using one of
those `PersistentVolumes`, it depends on the type of server that was using the volume.
- If an `Agent` was using the volume, it can be repaired as long as 2 other
Agents are still healthy.
- If a `DBServer` was using the volume, and the replication factor of all database
collections is 2 or higher, and the remaining DB-Servers are still healthy,
the cluster duplicates the remaining replicas to
bring the number of replicas back to the original number.
- If a `DBServer` was using the volume, and the replication factor of a database
collection is 1 and happens to be stored on that DB-Server, the data is lost.
- If a single server of an `ActiveFailover` deployment was using the volume, and the
other single server is still healthy, the other single server becomes leader.
After replacing the failed single server, the new follower synchronizes with
the leader.
### See also
- [Collecting debug data](./how-to/debugging.md)

31
docs/upgrading.md Normal file
View file

@ -0,0 +1,31 @@
# Upgrading ArangoDB version
The ArangoDB Kubernetes Operator supports upgrading an ArangoDB from
one version to the next.
**Warning!**
It is highly recommended to take a backup of your data before upgrading ArangoDB
using [arangodump](https://docs.arangodb.com/stable/components/tools/arangodump/) or [ArangoBackup CR](backup-resource.md).
## Upgrade an ArangoDB deployment
To upgrade a cluster, change the version by changing
the `spec.image` setting and the apply the updated
custom resource using:
```bash
kubectl apply -f yourCustomResourceFile.yaml
```
The ArangoDB operator will perform an sequential upgrade
of all servers in your deployment. Only one server is upgraded
at a time.
For patch level upgrades (e.g. 3.9.2 to 3.9.3) each server
is stopped and restarted with the new version.
For minor level upgrades (e.g. 3.9.2 to 3.10.0) each server
is stopped, then the new version is started with `--database.auto-upgrade`
and once that is finish the new version is started with the normal arguments.
The process for major level upgrades depends on the specific version.

298
docs/using-the-operator.md Normal file
View file

@ -0,0 +1,298 @@
# Using the ArangoDB Kubernetes Operator
## Installation
The ArangoDB Kubernetes Operator needs to be installed in your Kubernetes
cluster first. Make sure you have access to this cluster and the rights to
deploy resources at cluster level.
The following cloud provider Kubernetes offerings are officially supported:
- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)
If you have `Helm` available, use it for the installation as it is the
recommended installation method.
### Installation with Helm
To install the ArangoDB Kubernetes Operator with [`helm`](https://www.helm.sh/),
run the following commands (replace `<version>` with the
[version of the operator](https://github.com/arangodb/kube-arangodb/releases)
that you want to install):
```bash
export URLPREFIX=https://github.com/arangodb/kube-arangodb/releases/download/<version>
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz
```
This installs operators for the `ArangoDeployment` and `ArangoDeploymentReplication`
resource types, which are used to deploy ArangoDB and ArangoDB Datacenter-to-Datacenter Replication respectively.
If you want to avoid the installation of the operator for the `ArangoDeploymentReplication`
resource type, add `--set=DeploymentReplication.Create=false` to the `helm install`
command.
To use `ArangoLocalStorage` resources, also run:
```bash
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz --set "operator.features.storage=true"
```
The default CPU architecture of the operator is `amd64` (x86-64). To enable ARM
support (`arm64`) in the operator, overwrite the following setting:
```bash
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz --set "operator.architectures={amd64,arm64}"
```
Note that you need to set [`spec.architecture`](deployment-resource-reference.md#specarchitecture-string)
in the deployment specification, too, in order to create a deployment that runs
on ARM chips.
For more information on installing with `Helm` and how to customize an installation,
see [Using the ArangoDB Kubernetes Operator with Helm](helm.md).
### Installation with Kubectl
To install the ArangoDB Kubernetes Operator without `Helm`,
run (replace `<version>` with the version of the operator that you want to install):
```bash
export URLPREFIX=https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests
kubectl apply -f $URLPREFIX/arango-crd.yaml
kubectl apply -f $URLPREFIX/arango-deployment.yaml
```
To use `ArangoLocalStorage` resources to provision `PersistentVolumes` on local
storage, also run:
```bash
kubectl apply -f $URLPREFIX/arango-storage.yaml
```
Use this when running on bare-metal or if there is no provisioner for fast
storage in your Kubernetes cluster.
To use `ArangoDeploymentReplication` resources for ArangoDB
Datacenter-to-Datacenter Replication, also run:
```bash
kubectl apply -f $URLPREFIX/arango-deployment-replication.yaml
```
See [ArangoDeploymentReplication Custom Resource](deployment-replication-resource-reference.md)
for details and an example.
You can find the latest release of the ArangoDB Kubernetes Operator
in the [kube-arangodb repository](https://github.com/arangodb/kube-arangodb/releases/latest).
## ArangoDB deployment creation
After deploying the latest ArangoDB Kubernetes operator, use the command below to deploy your [license key](https://docs.arangodb.com/stable/operations/administration/license-management/) as a secret which is required for the Enterprise Edition starting with version 3.9:
```bash
kubectl create secret generic arango-license-key --from-literal=token-v2="<license-string>"
```
Once the operator is running, you can create your ArangoDB database deployment
by creating a `ArangoDeployment` custom resource and deploying it into your
Kubernetes cluster.
For example (all examples can be found in the [kube-arangodb repository](https://github.com/arangodb/kube-arangodb/tree/master/examples)):
```bash
kubectl apply -f examples/simple-cluster.yaml
```
Additionally, you can specify the license key required for the Enterprise Edition starting with version 3.9 as seen below:
```yaml
spec:
# [...]
image: arangodb/enterprise:3.9.1
license:
secretName: arango-license-key
```
## Connecting to your database
Access to ArangoDB deployments from outside the Kubernetes cluster is provided
using an external-access service. By default, this service is of type
`LoadBalancer`. If this type of service is not supported by your Kubernetes
cluster, it is replaced by a service of type `NodePort` after a minute.
To see the type of service that has been created, run (replace `<service-name>`
with the `metadata.name` you set in the deployment configuration, e.g.
`example-simple-cluster`):
```bash
kubectl get service <service-name>-ea
```
When the service is of the `LoadBalancer` type, use the IP address
listed in the `EXTERNAL-IP` column with port 8529.
When the service is of the `NodePort` type, use the IP address
of any of the nodes of the cluster, combine with the high (>30000) port listed
in the `PORT(S)` column.
Point your browser to `https://<ip>:<port>/` (note the `https` protocol).
Your browser shows a warning about an unknown certificate. Accept the
certificate for now. Then log in using the username `root` and an empty password.
## Deployment removal
To remove an existing ArangoDB deployment, delete the custom resource.
The operator deletes all created resources.
For example:
```bash
kubectl delete -f examples/simple-cluster.yaml
```
**Note that this will also delete all data in your ArangoDB deployment!**
If you want to keep your data, make sure to create a backup before removing the deployment.
## Operator removal
To remove the entire ArangoDB Kubernetes Operator, remove all
clusters first and then remove the operator by running:
```bash
helm delete <release-name-of-kube-arangodb-chart>
# If `ArangoLocalStorage` operator is installed
helm delete <release-name-of-kube-arangodb-storage-chart>
```
or when you used `kubectl` to install the operator, run:
```bash
kubectl delete deployment arango-deployment-operator
# If `ArangoLocalStorage` operator is installed
kubectl delete deployment -n kube-system arango-storage-operator
# If `ArangoDeploymentReplication` operator is installed
kubectl delete deployment arango-deployment-replication-operator
```
## Example deployment using `minikube`
If you want to get your feet wet with ArangoDB and Kubernetes, you can deploy
your first ArangoDB instance with `minikube`, which lets you easily set up a
local Kubernetes cluster.
Visit the [`minikube` website](https://minikube.sigs.k8s.io/)
and follow the installation instructions and start the cluster with
`minikube start`.
Next, go to <https://github.com/arangodb/kube-arangodb/releases>
to find out the latest version of the ArangoDB Kubernetes Operator. Then run the
following commands, with `<version>` replaced by the version you looked up:
```bash
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-crd.yaml
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-deployment.yaml
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-storage.yaml
```
To deploy a single server, create a file called `single-server.yaml` with the
following content:
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "single-server"
spec:
mode: Single
```
Insert this resource in your Kubernetes cluster using:
```bash
minikube kubectl -- apply -f single-server.yaml
```
To deploy an ArangoDB cluster instead, create a file called `cluster.yaml` with
the following content:
```yaml
apiVersion: "database.arangodb.com/v1"
kind: "ArangoDeployment"
metadata:
name: "cluster"
spec:
mode: Cluster
```
The same commands used in the single server deployment can be used to inspect
your cluster. Just use the correct deployment name (`cluster` instead of
`single-server`).
The `ArangoDeployment` operator in `kube-arangodb` inspects the resource you
just deployed and starts the process to run ArangoDB.
To inspect the current status of your deployment, run:
```bash
minikube kubectl -- describe ArangoDeployment single-server
# or shorter
minikube kubectl -- describe arango single-server
```
To inspect the pods created for this deployment, run:
```bash
minikube kubectl -- get pods --selector=arango_deployment=single-server
```
The result looks similar to this:
```
NAME READY STATUS RESTARTS AGE
single-server-sngl-cjtdxrgl-fe06f0 1/1 Running 0 1m
```
Once the pod reports that it is has a `Running` status and is ready,
your ArangoDB instance is available.
To access ArangoDB, run:
```bash
minikube service single-server-ea
```
This creates a temporary tunnel for the `single-server-ea` service and opens
your browser. You need change the URL to start with `https://`. By default,
it is `http://`, but the deployment uses TLS encryption for the connection.
For example, if the address is `http://127.0.0.1:59050`, you need to change it
to `https://127.0.0.1:59050`.
Your browser warns about an unknown certificate. This is because a self-signed
certificate is used. Continue anyway. The exact steps for this depend on your
browser.
You should see the login screen of ArangoDB's web interface. Enter `root` as the
username, leave the password field empty, and log in. Select the default
`_system` database. You should see the dashboard and be able to interact with
ArangoDB.
If you want to delete your single server ArangoDB database, just run:
```bash
minikube kubectl -- delete ArangoDeployment single-server
```
To shut down `minikube`, run:
```bash
minikube stop
```
## See also
- [Driver configuration](driver-configuration.md)
- [Scaling](scaling.md)
- [Upgrading](upgrading.md)
- [Using the ArangoDB Kubernetes Operator with Helm](helm.md)