mirror of
https://github.com/arangodb/kube-arangodb.git
synced 2024-12-14 11:57:37 +00:00
(Documentation) Move documentation from ArangoDB site into this repo (#1450)
- remove duplicated docs - update old docs with new info - rework docs index page - file names not changed to make sure redirects from old site will work as expected Co-authored-by: jwierzbo <jakub.wierzbowski@arangodb.com>
This commit is contained in:
parent
b9918115d9
commit
fe66d98444
28 changed files with 3808 additions and 34 deletions
|
@ -5,6 +5,7 @@
|
|||
- (Improvement) Print assigned node name to log and condition message when pod is scheduled
|
||||
- (Maintenance) Remove obsolete docs, restructure for better UX, generate index files
|
||||
- (Feature) Add `spec.upgrade.debugLog` option to configure upgrade container logging
|
||||
- (Documentation) Move documentation from ArangoDB into this repo, update and improve structure
|
||||
|
||||
## [1.2.34](https://github.com/arangodb/kube-arangodb/tree/1.2.34) (2023-10-16)
|
||||
- (Bugfix) Fix make manifests-crd-file command
|
||||
|
|
|
@ -6,13 +6,13 @@ ArangoDB Kubernetes Operator helps to run ArangoDB deployments
|
|||
on Kubernetes clusters.
|
||||
|
||||
To get started, follow the Installation instructions below and/or
|
||||
read the [tutorial](https://www.arangodb.com/docs/stable/deployment-kubernetes-usage.html).
|
||||
read the [tutorial](docs/using-the-operator.md).
|
||||
|
||||
## State
|
||||
|
||||
The ArangoDB Kubernetes Operator is Production ready.
|
||||
|
||||
[Documentation](https://www.arangodb.com/docs/stable/deployment-kubernetes.html)
|
||||
[Documentation](docs/README.md)
|
||||
|
||||
### Limits
|
||||
|
||||
|
|
|
@ -1,11 +1,57 @@
|
|||
# ArangoDB Kubernetes Operator
|
||||
|
||||
- [Tutorial](https://www.arangodb.com/docs/stable/tutorials-kubernetes.html)
|
||||
- [Documentation](https://www.arangodb.com/docs/stable/deployment-kubernetes.html)
|
||||
- [Architecture](./design/README.md)
|
||||
- [Features description and usage](./features/README.md)
|
||||
- [Custom Resources API Reference](./api/README.md)
|
||||
- [Operator Metrics & Alerts](./generated/metrics/README.md)
|
||||
- [Operator Actions](./generated/actions.md)
|
||||
- [Intro](#intro)
|
||||
- [Using the ArangoDB Kubernetes Operator](using-the-operator.md)
|
||||
- [Architecture overview](design/README.md)
|
||||
- [Features description and usage](features/README.md)
|
||||
- [Custom Resources API Reference](api/README.md)
|
||||
- [Operator Metrics & Alerts](generated/metrics/README.md)
|
||||
- [Operator Actions](generated/actions.md)
|
||||
- [Authentication](authentication.md)
|
||||
- Custom resources overview:
|
||||
- [ArangoDeployment](deployment-resource-reference.md)
|
||||
- [ArangoDeploymentReplication](deployment-replication-resource-reference.md)
|
||||
- [ArangoLocalStorage](storage-resource.md)
|
||||
- [Backup](backup-resource.md)
|
||||
- [BackupPolicy](backuppolicy-resource.md)
|
||||
- [Configuration and secrets](configuration-and-secrets.md)
|
||||
- [Configuring your driver for ArangoDB access](driver-configuration.md)
|
||||
- [Using Helm](helm.md)
|
||||
- [Collecting metrics](metrics.md)
|
||||
- [Services & Load balancer](services-and-load-balancer.md)
|
||||
- [Storage configuration](storage.md)
|
||||
- [Secure connections (TLS)](tls.md)
|
||||
- [Upgrading ArangoDB version](upgrading.md)
|
||||
- [Scaling your ArangoDB deployment](scaling.md)
|
||||
- [Draining the Kubernetes nodes](draining-nodes.md)
|
||||
- Known issues (TBD)
|
||||
- [Troubleshooting](troubleshooting.md)
|
||||
- [How-to ...](how-to/README.md)
|
||||
|
||||
## Intro
|
||||
|
||||
The ArangoDB Kubernetes Operator (`kube-arangodb`) is a set of operators
|
||||
that you deploy in your Kubernetes cluster to:
|
||||
|
||||
- Manage deployments of the ArangoDB database
|
||||
- Manage backups
|
||||
- Provide `PersistentVolumes` on local storage of your nodes for optimal storage performance.
|
||||
- Configure ArangoDB Datacenter-to-Datacenter Replication
|
||||
|
||||
Each of these uses involves a different custom resource.
|
||||
|
||||
- Use an [`ArangoDeployment` resource](deployment-resource-reference.md) to
|
||||
create an ArangoDB database deployment.
|
||||
- Use an [`ArangoBackup`](backup-resource.md) and `ArangoBackupPolicy` resources to
|
||||
create ArangoDB backups.
|
||||
- Use an [`ArangoLocalStorage` resource](storage-resource.md) to
|
||||
provide local `PersistentVolumes` for optimal I/O performance.
|
||||
- Use an [`ArangoDeploymentReplication` resource](deployment-replication-resource-reference.md) to
|
||||
configure ArangoDB Datacenter-to-Datacenter Replication.
|
||||
|
||||
Continue with [Using the ArangoDB Kubernetes Operator](using-the-operator.md)
|
||||
to learn how to install the ArangoDB Kubernetes operator and create
|
||||
your first deployment.
|
||||
|
||||
For more information about the production readiness state, please refer to the
|
||||
[ArangoDB Kubernetes Operator repository](https://github.com/arangodb/kube-arangodb#production-readiness-state).
|
||||
|
|
18
docs/authentication.md
Normal file
18
docs/authentication.md
Normal file
|
@ -0,0 +1,18 @@
|
|||
# Authentication
|
||||
|
||||
The ArangoDB Kubernetes Operator will by default create ArangoDB deployments
|
||||
that require authentication to access the database.
|
||||
|
||||
It uses a single JWT secret (stored in a Kubernetes secret)
|
||||
to provide *super-user* access between all servers of the deployment
|
||||
as well as access from the ArangoDB Operator to the deployment.
|
||||
|
||||
To disable authentication, set `spec.auth.jwtSecretName` to `None`.
|
||||
|
||||
Initially the deployment is accessible through the web user-interface and
|
||||
APIs, using the user `root` with an empty password.
|
||||
Make sure to change this password immediately after starting the deployment!
|
||||
|
||||
## See also
|
||||
|
||||
- [Secure connections (TLS)](tls.md)
|
554
docs/backup-resource.md
Normal file
554
docs/backup-resource.md
Normal file
|
@ -0,0 +1,554 @@
|
|||
# ArangoBackup Custom Resource
|
||||
|
||||
The ArangoBackup Operator creates and maintains ArangoBackups
|
||||
in a Kubernetes cluster, given a Backup specification.
|
||||
This deployment specification is a `CustomResource` following
|
||||
a `CustomResourceDefinition` created by the operator.
|
||||
|
||||
## Examples:
|
||||
|
||||
### Create simple Backup
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackup"
|
||||
metadata:
|
||||
name: "example-arangodb-backup"
|
||||
namespace: "arangodb"
|
||||
spec:
|
||||
deployment:
|
||||
name: "my-deployment"
|
||||
```
|
||||
|
||||
Action:
|
||||
|
||||
Create Backup on ArangoDeployment named `my-deployment`
|
||||
|
||||
### Create and upload Backup
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackup"
|
||||
metadata:
|
||||
name: "example-arangodb-backup"
|
||||
namespace: "arangodb"
|
||||
spec:
|
||||
deployment:
|
||||
name: "my-deployment"
|
||||
upload:
|
||||
repositoryURL: "S3://test/kube-test"
|
||||
credentialsSecretName: "my-s3-rclone-credentials"
|
||||
```
|
||||
|
||||
Action:
|
||||
|
||||
Create Backup on ArangoDeployment named `my-deployment` and upload it to `S3://test/kube-test`.
|
||||
|
||||
### Download Backup
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackup"
|
||||
metadata:
|
||||
name: "example-arangodb-backup"
|
||||
namespace: "arangodb"
|
||||
spec:
|
||||
deployment:
|
||||
name: "my-deployment"
|
||||
download:
|
||||
repositoryURL: "S3://test/kube-test"
|
||||
credentialsSecretName: "my-s3-rclone-credentials"
|
||||
id: "backup-id"
|
||||
```
|
||||
|
||||
Download Backup with id `backup-id` from `S3://test/kube-test` on ArangoDeployment named `my-deployment`
|
||||
|
||||
### Restore
|
||||
|
||||
Information about restoring can be found in [ArangoDeployment](deployment-resource-reference.md).
|
||||
|
||||
## Advertised fields
|
||||
|
||||
List of custom columns in CRD specification for Kubectl:
|
||||
- `.spec.policyName` - optional name of the policy
|
||||
- `.spec.deployment.name` - name of the deployment
|
||||
- `.status.state` - current ArangoBackup Custom Resource state
|
||||
- `.status.message` - additional message for current state
|
||||
|
||||
## ArangoBackup Custom Resource Spec:
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackup"
|
||||
metadata:
|
||||
name: "example-arangodb-backup"
|
||||
namespace: "arangodb"
|
||||
spec:
|
||||
policyName: "my-policy"
|
||||
deployment:
|
||||
name: "my-deployment"
|
||||
options:
|
||||
timeout: 3
|
||||
force: true
|
||||
download:
|
||||
repositoryURL: "s3:/..."
|
||||
credentialsSecretName: "secret-name"
|
||||
id: "backup-id"
|
||||
upload:
|
||||
repositoryURL: "s3:/..."
|
||||
credentialsSecretName: "secret-name"
|
||||
status:
|
||||
state: "Ready"
|
||||
time: "time"
|
||||
message: "Message details" -
|
||||
progress:
|
||||
jobID: "id"
|
||||
progress: "10%"
|
||||
backup:
|
||||
id: "id"
|
||||
version: "3.9.0-dev"
|
||||
forced: true
|
||||
uploaded: true
|
||||
downloaded: true
|
||||
createdAt: "time"
|
||||
sizeInBytes: 1
|
||||
numberOfDBServers: 3
|
||||
available: true
|
||||
```
|
||||
|
||||
## `spec: Object`
|
||||
|
||||
Spec of the ArangoBackup Custom Resource.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
### `spec.deployment: Object`
|
||||
|
||||
ArangoDeployment specification.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `spec.deployment.name: String`
|
||||
|
||||
Name of the ArangoDeployment Custom Resource within same namespace as ArangoBackup Custom Resource.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `spec.policyName: String`
|
||||
|
||||
Name of the ArangoBackupPolicy which created this Custom Resource
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: ""
|
||||
|
||||
### `spec.options: Object`
|
||||
|
||||
Backup options.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `spec.options.timeout: float`
|
||||
|
||||
Timeout for Backup creation request in seconds.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: 30
|
||||
|
||||
#### `spec.options.allowInconsistent: bool`
|
||||
|
||||
AllowInconsistent flag for Backup creation request.
|
||||
If this value is set to true, backup is taken even if we are not able to acquire lock.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: false
|
||||
|
||||
### `spec.download: Object`
|
||||
|
||||
Backup download settings.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `spec.download.repositoryURL: string`
|
||||
|
||||
Field is immutable. Protocol needs to be defined in `spec.download.credentialsSecretName` if protocol is other than local.
|
||||
|
||||
Mode protocols can be found at [rclone.org](https://rclone.org/).
|
||||
|
||||
Format: `<protocol>:/<path>`
|
||||
|
||||
Examples:
|
||||
- `s3://my-bucket/test`
|
||||
- `azure://test`
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `spec.download.credentialsSecretName: string`
|
||||
|
||||
Field is immutable. Name of the secret used while accessing repository
|
||||
|
||||
Secret structure:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
data:
|
||||
token: <json token>
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: <name>
|
||||
type: Opaque
|
||||
```
|
||||
|
||||
`JSON Token` options are described on the [rclone](https://rclone.org/) page.
|
||||
We can define more than one protocols at same time in one secret.
|
||||
|
||||
This field is defined in json format:
|
||||
|
||||
```json
|
||||
{
|
||||
"<protocol>": {
|
||||
"type":"<type>",
|
||||
...parameters
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
AWS S3 example - based on [rclone S3](https://rclone.org/s3/) documentation and interactive process:
|
||||
|
||||
```json
|
||||
{
|
||||
"S3": {
|
||||
"type": "s3", # Choose s3 type
|
||||
"provider": "AWS", # Choose one of the providers
|
||||
"env_auth": "false", # Define credentials in next step instead of using ENV
|
||||
"access_key_id": "xxx",
|
||||
"secret_access_key": "xxx",
|
||||
"region": "eu-west-2", # Choose region
|
||||
"acl": "private", # Set permissions on newly created remote object
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
and you can from now use `S3://bucket/path`.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: ""
|
||||
|
||||
##### Use IAM with Amazon EKS
|
||||
|
||||
Instead of creating and distributing your AWS credentials to the containers or
|
||||
using the Amazon EC2 instance's role, you can associate an IAM role with a
|
||||
Kubernetes service account and configure pods to use the service account.
|
||||
|
||||
1. Create a Policy to access the S3 bucket.
|
||||
|
||||
```bash
|
||||
aws iam create-policy \
|
||||
--policy-name S3-ACCESS_ROLE \
|
||||
--policy-document \
|
||||
'{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "s3:ListAllMyBuckets",
|
||||
"Resource": "*"
|
||||
},
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "*",
|
||||
"Resource": "arn:aws:s3:::MY_BUCKET"
|
||||
},
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "*",
|
||||
"Resource": "arn:aws:s3:::MY_BUCKET/*"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
2. Create an IAM role for the service account (SA).
|
||||
|
||||
```bash
|
||||
eksctl create iamserviceaccount \
|
||||
--name SA_NAME \
|
||||
--namespace NAMESPACE \
|
||||
--cluster CLUSTER_NAME \
|
||||
--attach-policy-arn arn:aws:iam::ACCOUNT_ID:policy/S3-ACCESS_ROLE \
|
||||
--approve
|
||||
```
|
||||
|
||||
3. Ensure that you use that SA in your ArangoDeployment for `dbservers` and
|
||||
`coordinators`.
|
||||
|
||||
```yaml
|
||||
apiVersion: database.arangodb.com/v1
|
||||
kind: ArangoDeployment
|
||||
metadata:
|
||||
name: cluster
|
||||
spec:
|
||||
image: arangodb/enterprise
|
||||
mode: Cluster
|
||||
|
||||
dbservers:
|
||||
serviceAccountName: SA_NAME
|
||||
coordinators:
|
||||
serviceAccountName: SA_NAME
|
||||
```
|
||||
|
||||
4. Create a `Secret` Kubernetes object with a configuration for S3.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: arangodb-cluster-backup-credentials
|
||||
type: Opaque
|
||||
stringData:
|
||||
token: |
|
||||
{
|
||||
"s3": {
|
||||
"type": "s3",
|
||||
"provider": "AWS",
|
||||
"env_auth": "true",
|
||||
"location_constraint": "eu-central-1",
|
||||
"region": "eu-central-1",
|
||||
"acl": "private",
|
||||
"no_check_bucket": "true"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
5. Create an `ArangoBackup` Kubernetes object with upload to S3.
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1alpha"
|
||||
kind: "ArangoBackup"
|
||||
metadata:
|
||||
name: backup
|
||||
spec:
|
||||
deployment:
|
||||
name: MY_DEPLOYMENT
|
||||
upload:
|
||||
repositoryURL: "s3:MY_BUCKET"
|
||||
credentialsSecretName: arangodb-cluster-backup-credentials
|
||||
```
|
||||
|
||||
#### `spec.download.id: string`
|
||||
|
||||
ID of the ArangoBackup to be downloaded.
|
||||
|
||||
Field is immutable.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
### `spec.upload: Object`
|
||||
|
||||
Backup upload settings.
|
||||
|
||||
This field can be removed and created again with different values. This operation will trigger upload again.
|
||||
Fields in Custom Resource Spec Upload are immutable.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `spec.upload.repositoryURL: string`
|
||||
|
||||
Same structure as `spec.download.repositoryURL`.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `spec.upload.credentialsSecretName: string`
|
||||
|
||||
Same structure as `spec.download.credentialsSecretName`.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: ""
|
||||
|
||||
## `status: Object`
|
||||
|
||||
Status of the ArangoBackup Custom Resource. This field is managed by subresource and only by operator
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
### `status.state: enum`
|
||||
|
||||
State of the ArangoBackup object.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
Possible states:
|
||||
- "" - default state, changed to "Pending"
|
||||
- "Pending" - state in which Custom Resource is queued. If Backup is possible changed to "Scheduled"
|
||||
- "Scheduled" - state which will start create/download process
|
||||
- "Download" - state in which download request will be created on ArangoDB
|
||||
- "DownloadError" - state when download failed
|
||||
- "Downloading" - state for downloading progress
|
||||
- "Create" - state for creation, field available set to true
|
||||
- "Upload" - state in which upload request will be created on ArangoDB
|
||||
- "Uploading" - state for uploading progress
|
||||
- "UploadError" - state when uploading failed
|
||||
- "Ready" - state when Backup is finished
|
||||
- "Deleted" - state when Backup was once in ready, but has been deleted
|
||||
- "Failed" - state for failure
|
||||
- "Unavailable" - state when Backup is not available on the ArangoDB. It can happen in case of upgrades, node restarts etc.
|
||||
|
||||
### `status.time: timestamp`
|
||||
|
||||
Time in UTC when state of the ArangoBackup Custom Resource changed.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
### `status.message: string`
|
||||
|
||||
State message of the ArangoBackup Custom Resource.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: ""
|
||||
|
||||
### `status.progress: object`
|
||||
|
||||
Progress info of the uploading and downloading process.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `status.progress.jobID: string`
|
||||
|
||||
ArangoDB job ID for uploading or downloading.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `status.progress.progress: string`
|
||||
|
||||
ArangoDeployment job progress.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: "0%"
|
||||
|
||||
### `status.backup: object`
|
||||
|
||||
ArangoBackup details.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
#### `status.backup.id: string`
|
||||
|
||||
ArangoBackup ID.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `status.backup.version: string`
|
||||
|
||||
ArangoBackup version.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
#### `status.backup.potentiallyInconsistent: bool`
|
||||
|
||||
ArangoBackup potentially inconsistent flag.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: false
|
||||
|
||||
#### `status.backup.uploaded: bool`
|
||||
|
||||
Determines if ArangoBackup has been uploaded.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: false
|
||||
|
||||
#### `status.backup.downloaded: bool`
|
||||
|
||||
Determines if ArangoBackup has been downloaded.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: false
|
||||
|
||||
#### `status.backup.createdAt: TimeStamp`
|
||||
|
||||
ArangoBackup Custom Resource creation time in UTC.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: now()
|
||||
|
||||
#### `status.backup.sizeInBytes: uint64`
|
||||
|
||||
Size of the Backup in ArangoDB.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: 0
|
||||
|
||||
#### `status.backup.numberOfDBServers: uint`
|
||||
|
||||
Cluster size of the Backup in ArangoDB.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: 0
|
||||
|
||||
### `status.available: bool`
|
||||
|
||||
Determines if we can restore from ArangoBackup.
|
||||
|
||||
Required: true
|
||||
|
||||
Default: false
|
185
docs/backuppolicy-resource.md
Normal file
185
docs/backuppolicy-resource.md
Normal file
|
@ -0,0 +1,185 @@
|
|||
# ArangoBackupPolicy Custom Resource
|
||||
|
||||
The ArangoBackupPolicy represents schedule definition for creating ArangoBackup Custom Resources by operator.
|
||||
This deployment specification is a `CustomResource` following a `CustomResourceDefinition` created by the operator.
|
||||
|
||||
## Examples
|
||||
|
||||
### Create schedule for all deployments
|
||||
|
||||
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15 minutes.
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackupPolicy"
|
||||
metadata:
|
||||
name: "example-arangodb-backup-policy"
|
||||
spec:
|
||||
schedule: "*/15 * * * *"
|
||||
```
|
||||
|
||||
### Create schedule for selected deployments
|
||||
|
||||
You can create an ArangoBackup Custom Resource for selected ArangoBackups every 15 minutes.
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackupPolicy"
|
||||
metadata:
|
||||
name: "example-arangodb-backup-policy"
|
||||
spec:
|
||||
schedule: "*/15 * * * *"
|
||||
selector:
|
||||
matchLabels:
|
||||
labelName: "labelValue"
|
||||
```
|
||||
|
||||
### Create schedule for all deployments and upload
|
||||
|
||||
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15
|
||||
minutes and upload it to the specified repositoryURL.
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackupPolicy"
|
||||
metadata:
|
||||
name: "example-arangodb-backup-policy"
|
||||
spec:
|
||||
schedule: "*/15 * * * * "
|
||||
template:
|
||||
upload:
|
||||
repositoryURL: "s3:/..."
|
||||
credentialsSecretName: "secret-name"
|
||||
```
|
||||
|
||||
### Create schedule for all deployments, don't allow parallel backup runs, keep limited number of backups
|
||||
|
||||
You can create an ArangoBackup Custom Resource for each ArangoBackup every 15
|
||||
minutes. You can keep 10 backups per deployment at the same time, and delete the
|
||||
oldest ones. Don't allow to run backup if previous backup is not finished.
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackupPolicy"
|
||||
metadata:
|
||||
name: "example-arangodb-backup-policy"
|
||||
spec:
|
||||
schedule: "*/15 * * * *"
|
||||
maxBackups: 10
|
||||
allowConcurrent: False
|
||||
```
|
||||
|
||||
## ArangoBackup Custom Resource Spec
|
||||
|
||||
```yaml
|
||||
apiVersion: "backup.arangodb.com/v1"
|
||||
kind: "ArangoBackupPolicy"
|
||||
metadata:
|
||||
name: "example-arangodb-backup-policy"
|
||||
spec:
|
||||
schedule: "*/15 * * * * "
|
||||
selector:
|
||||
matchLabels:
|
||||
labelName: "labelValue"
|
||||
matchExpressions: []
|
||||
template:
|
||||
options:
|
||||
timeout: 3
|
||||
force: true
|
||||
upload:
|
||||
repositoryURL: "s3:/..."
|
||||
credentialsSecretName: "secret-name"
|
||||
status:
|
||||
scheduled: "time"
|
||||
message: "message"
|
||||
```
|
||||
|
||||
## `spec: Object`
|
||||
|
||||
Spec of the ArangoBackupPolicy Custom Resource
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
### `spec.schedule: String`
|
||||
|
||||
Schedule definition. Parser from https://godoc.org/github.com/robfig/cron
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
### `spec.allowConcurrent: String`
|
||||
|
||||
If false, ArangoBackup will not be created when previous backups are not finished.
|
||||
`ScheduleSkipped` event will be published in that case.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: True
|
||||
|
||||
### `spec.maxBackups: Integer`
|
||||
|
||||
If > 0, then old healthy backups of that policy will be removed to ensure that only `maxBackups` are present at same time.
|
||||
`CleanedUpOldBackups` event will be published on automatic removal of old backups.
|
||||
|
||||
Required: false
|
||||
|
||||
Default: 0
|
||||
|
||||
### `spec.selector: Object`
|
||||
|
||||
Selector definition for selecting matching ArangoBackup Custom Resources. Parser from https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#LabelSelector
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
### `spec.template: ArangoBackupTemplate`
|
||||
|
||||
Template for the ArangoBackup Custom Resource
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
### `spec.template.options: ArangoBackup - spec.options`
|
||||
|
||||
ArangoBackup options
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
### `spec.template.upload: ArangoBackup - spec.upload`
|
||||
|
||||
ArangoBackup upload configuration
|
||||
|
||||
Required: false
|
||||
|
||||
Default: {}
|
||||
|
||||
## `status: Object`
|
||||
|
||||
Status of the ArangoBackupPolicy Custom Resource managed by operator
|
||||
|
||||
Required: true
|
||||
|
||||
Default: {}
|
||||
|
||||
### `status.scheduled: TimeStamp`
|
||||
|
||||
Next scheduled time in UTC
|
||||
|
||||
Required: true
|
||||
|
||||
Default: ""
|
||||
|
||||
### `status.message: String`
|
||||
|
||||
Message from the operator in case of failure - schedule not valid, ArangoBackupPolicy not valid
|
||||
|
||||
Required: false
|
||||
|
||||
Default: ""
|
36
docs/configuration-and-secrets.md
Normal file
36
docs/configuration-and-secrets.md
Normal file
|
@ -0,0 +1,36 @@
|
|||
# Configuration & secrets
|
||||
|
||||
An ArangoDB cluster has lots of configuration options.
|
||||
Some will be supported directly in the ArangoDB Operator,
|
||||
others will have to specified separately.
|
||||
|
||||
## Passing command line options
|
||||
|
||||
All command-line options of `arangod` (and `arangosync`) are available
|
||||
by adding options to the `spec.<group>.args` list of a group
|
||||
of servers.
|
||||
|
||||
These arguments are added to the command-line created for these servers.
|
||||
|
||||
## Secrets
|
||||
|
||||
The ArangoDB cluster needs several secrets such as JWT tokens
|
||||
TLS certificates and so on.
|
||||
|
||||
All these secrets are stored as Kubernetes Secrets and passed to
|
||||
the applicable Pods as files, mapped into the Pods filesystem.
|
||||
|
||||
The name of the secret is specified in the custom resource.
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "example-simple-cluster"
|
||||
spec:
|
||||
mode: Cluster
|
||||
image: 'arangodb/arangodb:3.10.8'
|
||||
auth:
|
||||
jwtSecretName: <name-of-JWT-token-secret>
|
||||
```
|
|
@ -1,3 +1,3 @@
|
|||
# Deployment Operator Dashboard
|
||||
# Deployment Operator Dashboards
|
||||
|
||||
### Dashboard UI now is deprecated and will be removed in next minor version
|
334
docs/deployment-replication-resource-reference.md
Normal file
334
docs/deployment-replication-resource-reference.md
Normal file
|
@ -0,0 +1,334 @@
|
|||
# ArangoDeploymentReplication Custom Resource
|
||||
|
||||
#### Enterprise Edition only
|
||||
|
||||
|
||||
The ArangoDB Replication Operator creates and maintains ArangoDB
|
||||
`arangosync` configurations in a Kubernetes cluster, given a replication specification.
|
||||
This replication specification is a `CustomResource` following
|
||||
a `CustomResourceDefinition` created by the operator.
|
||||
|
||||
Example of a minimal replication definition for two ArangoDB clusters with
|
||||
sync in the same Kubernetes cluster:
|
||||
|
||||
```yaml
|
||||
apiVersion: "replication.database.arangodb.com/v1"
|
||||
kind: "ArangoDeploymentReplication"
|
||||
metadata:
|
||||
name: "replication-from-a-to-b"
|
||||
spec:
|
||||
source:
|
||||
deploymentName: cluster-a
|
||||
auth:
|
||||
keyfileSecretName: cluster-a-sync-auth
|
||||
destination:
|
||||
deploymentName: cluster-b
|
||||
```
|
||||
|
||||
This definition results in:
|
||||
|
||||
- the arangosync `SyncMaster` in deployment `cluster-b` is called to configure a synchronization
|
||||
from the syncmasters in `cluster-a` to the syncmasters in `cluster-b`,
|
||||
using the client authentication certificate stored in `Secret` `cluster-a-sync-auth`.
|
||||
To access `cluster-a`, the JWT secret found in the deployment of `cluster-a` is used.
|
||||
To access `cluster-b`, the JWT secret found in the deployment of `cluster-b` is used.
|
||||
|
||||
Example replication definition for replicating from a source that is outside the current Kubernetes cluster
|
||||
to a destination that is in the same Kubernetes cluster:
|
||||
|
||||
```yaml
|
||||
apiVersion: "replication.database.arangodb.com/v1"
|
||||
kind: "ArangoDeploymentReplication"
|
||||
metadata:
|
||||
name: "replication-from-a-to-b"
|
||||
spec:
|
||||
source:
|
||||
masterEndpoint: ["https://163.172.149.229:31888", "https://51.15.225.110:31888", "https://51.15.229.133:31888"]
|
||||
auth:
|
||||
keyfileSecretName: cluster-a-sync-auth
|
||||
tls:
|
||||
caSecretName: cluster-a-sync-ca
|
||||
destination:
|
||||
deploymentName: cluster-b
|
||||
```
|
||||
|
||||
This definition results in:
|
||||
|
||||
- the arangosync `SyncMaster` in deployment `cluster-b` is called to configure a synchronization
|
||||
from the syncmasters located at the given list of endpoint URLs to the syncmasters `cluster-b`,
|
||||
using the client authentication certificate stored in `Secret` `cluster-a-sync-auth`.
|
||||
To access `cluster-a`, the keyfile (containing a client authentication certificate) is used.
|
||||
To access `cluster-b`, the JWT secret found in the deployment of `cluster-b` is used.
|
||||
|
||||
## DC2DC Replication Example
|
||||
|
||||
The requirements for setting up Datacenter-to-Datacenter (DC2DC) Replication are:
|
||||
|
||||
- You need to have two ArangoDB clusters running in two different Kubernetes clusters.
|
||||
- Both Kubernetes clusters are equipped with support for `Services` of type `LoadBalancer`.
|
||||
- You can create (global) DNS names for configured `Services` with low propagation times. E.g. use Cloudflare.
|
||||
- You have 4 DNS names available:
|
||||
- One for the database in the source ArangoDB cluster, e.g. `src-db.mycompany.com`
|
||||
- One for the ArangoDB syncmasters in the source ArangoDB cluster, e.g. `src-sync.mycompany.com`
|
||||
- One for the database in the destination ArangoDB cluster, e.g. `dst-db.mycompany.com`
|
||||
- One for the ArangoDB syncmasters in the destination ArangoDB cluster, e.g. `dst-sync.mycompany.com`
|
||||
|
||||
Follow these steps to configure DC2DC replication between two ArangoDB clusters
|
||||
running in Kubernetes:
|
||||
|
||||
1. Enable DC2DC Replication support on the source ArangoDB cluster.
|
||||
|
||||
Set your current Kubernetes context to the Kubernetes source cluster.
|
||||
|
||||
Edit the `ArangoDeployment` of the source ArangoDB clusters:
|
||||
|
||||
- Set `spec.tls.altNames` to `["src-db.mycompany.com"]` (can include more names / IP addresses)
|
||||
- Set `spec.sync.enabled` to `true`
|
||||
- Set `spec.sync.externalAccess.masterEndpoint` to `["https://src-sync.mycompany.com:8629"]`
|
||||
- Set `spec.sync.externalAccess.accessPackageSecretNames` to `["src-accesspackage"]`
|
||||
|
||||
2. Extract the access package from the source ArangoDB cluster.
|
||||
|
||||
```bash
|
||||
kubectl get secret src-accesspackage --template='{{index .data "accessPackage.yaml"}}' | \
|
||||
base64 -D > accessPackage.yaml
|
||||
```
|
||||
|
||||
3. Configure the source DNS names.
|
||||
|
||||
```bash
|
||||
kubectl get service
|
||||
```
|
||||
|
||||
Find the IP address contained in the `LoadBalancer` column for the following `Services`:
|
||||
- `<deployment-name>-ea` Use this IP address for the `src-db.mycompany.com` DNS name.
|
||||
- `<deployment-name>-sync` Use this IP address for the `src-sync.mycompany.com` DNS name.
|
||||
|
||||
The process for configuring DNS names is specific to each DNS provider.
|
||||
|
||||
Set your current Kubernetes context to the Kubernetes destination cluster.
|
||||
|
||||
Edit the `ArangoDeployment` of the source ArangoDB clusters:
|
||||
|
||||
- Set `spec.tls.altNames` to `["dst-db.mycompany.com"]` (can include more names / IP addresses)
|
||||
- Set `spec.sync.enabled` to `true`
|
||||
- Set `spec.sync.externalAccess.masterEndpoint` to `["https://dst-sync.mycompany.com:8629"]`
|
||||
|
||||
4. Enable DC2DC Replication support on the destination ArangoDB cluster.
|
||||
|
||||
5. Import the access package in the destination cluster.
|
||||
|
||||
```bash
|
||||
kubectl apply -f accessPackage.yaml
|
||||
```
|
||||
|
||||
Note: This imports two `Secrets`, containing TLS information about the source
|
||||
cluster, into the destination cluster.
|
||||
|
||||
6. Configure the destination DNS names.
|
||||
|
||||
```bash
|
||||
kubectl get service
|
||||
```
|
||||
|
||||
Find the IP address contained in the `LoadBalancer` column for the following `Services`:
|
||||
|
||||
- `<deployment-name>-ea` Use this IP address for the `dst-db.mycompany.com` DNS name.
|
||||
- `<deployment-name>-sync` Use this IP address for the `dst-sync.mycompany.com` DNS name.
|
||||
|
||||
The process for configuring DNS names is specific to each DNS provider.
|
||||
|
||||
7. Create an `ArangoDeploymentReplication` resource.
|
||||
|
||||
Create a yaml file (e.g. called `src-to-dst-repl.yaml`) with the following content:
|
||||
|
||||
```yaml
|
||||
apiVersion: "replication.database.arangodb.com/v1"
|
||||
kind: "ArangoDeploymentReplication"
|
||||
metadata:
|
||||
name: "replication-src-to-dst"
|
||||
spec:
|
||||
source:
|
||||
masterEndpoint: ["https://src-sync.mycompany.com:8629"]
|
||||
auth:
|
||||
keyfileSecretName: src-accesspackage-auth
|
||||
tls:
|
||||
caSecretName: src-accesspackage-ca
|
||||
destination:
|
||||
deploymentName: <dst-deployment-name>
|
||||
```
|
||||
|
||||
8. Wait for the DNS names to propagate.
|
||||
|
||||
Wait until the DNS names configured in step 3 and 6 resolve to their configured
|
||||
IP addresses.
|
||||
|
||||
Depending on your DNS provides this can take a few minutes up to 24 hours.
|
||||
|
||||
9. Activate the replication.
|
||||
|
||||
```bash
|
||||
kubectl apply -f src-to-dst-repl.yaml
|
||||
```
|
||||
|
||||
Replication from the source cluster to the destination cluster will now be configured.
|
||||
|
||||
Check the status of the replication by inspecting the status of the
|
||||
`ArangoDeploymentReplication` resource using:
|
||||
|
||||
```bash
|
||||
kubectl describe ArangoDeploymentReplication replication-src-to-dst
|
||||
```
|
||||
|
||||
As soon as the replication is configured, the `Add collection` button in the `Collections`
|
||||
page of the web interface (of the destination cluster) will be grayed out.
|
||||
|
||||
## Specification reference
|
||||
|
||||
Below you'll find all settings of the `ArangoDeploymentReplication` custom resource.
|
||||
|
||||
### `spec.source.deploymentName: string`
|
||||
|
||||
This setting specifies the name of an `ArangoDeployment` resource that runs a cluster
|
||||
with sync enabled.
|
||||
|
||||
This cluster configured as the replication source.
|
||||
|
||||
### `spec.source.masterEndpoint: []string`
|
||||
|
||||
This setting specifies zero or more master endpoint URLs of the source cluster.
|
||||
|
||||
Use this setting if the source cluster is not running inside a Kubernetes cluster
|
||||
that is reachable from the Kubernetes cluster the `ArangoDeploymentReplication` resource is deployed in.
|
||||
|
||||
Specifying this setting and `spec.source.deploymentName` at the same time is not allowed.
|
||||
|
||||
### `spec.source.auth.keyfileSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a client authentication certificate called `tls.keyfile` used to authenticate
|
||||
with the SyncMaster at the specified source.
|
||||
|
||||
If `spec.source.auth.userSecretName` has not been set,
|
||||
the client authentication certificate found in the secret with this name is also used to configure
|
||||
the synchronization and fetch the synchronization status.
|
||||
|
||||
This setting is required.
|
||||
|
||||
### `spec.source.auth.userSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a `username` & `password` used to authenticate
|
||||
with the SyncMaster at the specified source in order to configure synchronization and fetch synchronization status.
|
||||
|
||||
The user identified by the username must have write access in the `_system` database of the source ArangoDB cluster.
|
||||
|
||||
### `spec.source.tls.caSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a TLS CA certificate `ca.crt` used to verify
|
||||
the TLS connection created by the SyncMaster at the specified source.
|
||||
|
||||
This setting is required, unless `spec.source.deploymentName` has been set.
|
||||
|
||||
### `spec.destination.deploymentName: string`
|
||||
|
||||
This setting specifies the name of an `ArangoDeployment` resource that runs a cluster
|
||||
with sync enabled.
|
||||
|
||||
This cluster configured as the replication destination.
|
||||
|
||||
### `spec.destination.masterEndpoint: []string`
|
||||
|
||||
This setting specifies zero or more master endpoint URLs of the destination cluster.
|
||||
|
||||
Use this setting if the destination cluster is not running inside a Kubernetes cluster
|
||||
that is reachable from the Kubernetes cluster the `ArangoDeploymentReplication` resource is deployed in.
|
||||
|
||||
Specifying this setting and `spec.destination.deploymentName` at the same time is not allowed.
|
||||
|
||||
### `spec.destination.auth.keyfileSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a client authentication certificate called `tls.keyfile` used to authenticate
|
||||
with the SyncMaster at the specified destination.
|
||||
|
||||
If `spec.destination.auth.userSecretName` has not been set,
|
||||
the client authentication certificate found in the secret with this name is also used to configure
|
||||
the synchronization and fetch the synchronization status.
|
||||
|
||||
This setting is required, unless `spec.destination.deploymentName` or `spec.destination.auth.userSecretName` has been set.
|
||||
|
||||
Specifying this setting and `spec.destination.userSecretName` at the same time is not allowed.
|
||||
|
||||
### `spec.destination.auth.userSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a `username` & `password` used to authenticate
|
||||
with the SyncMaster at the specified destination in order to configure synchronization and fetch synchronization status.
|
||||
|
||||
The user identified by the username must have write access in the `_system` database of the destination ArangoDB cluster.
|
||||
|
||||
Specifying this setting and `spec.destination.keyfileSecretName` at the same time is not allowed.
|
||||
|
||||
### `spec.destination.tls.caSecretName: string`
|
||||
|
||||
This setting specifies the name of a `Secret` containing a TLS CA certificate `ca.crt` used to verify
|
||||
the TLS connection created by the SyncMaster at the specified destination.
|
||||
|
||||
This setting is required, unless `spec.destination.deploymentName` has been set.
|
||||
|
||||
## Authentication details
|
||||
|
||||
The authentication settings in a `ArangoDeploymentReplication` resource are used for two distinct purposes.
|
||||
|
||||
The first use is the authentication of the syncmasters at the destination with the syncmasters at the source.
|
||||
This is always done using a client authentication certificate which is found in a `tls.keyfile` field
|
||||
in a secret identified by `spec.source.auth.keyfileSecretName`.
|
||||
|
||||
The second use is the authentication of the ArangoDB Replication operator with the syncmasters at the source
|
||||
or destination. These connections are made to configure synchronization, stop configuration and fetch the status
|
||||
of the configuration.
|
||||
The method used for this authentication is derived as follows (where `X` is either `source` or `destination`):
|
||||
|
||||
- If `spec.X.userSecretName` is set, the username + password found in the `Secret` identified by this name is used.
|
||||
- If `spec.X.keyfileSecretName` is set, the client authentication certificate (keyfile) found in the `Secret` identifier by this name is used.
|
||||
- If `spec.X.deploymentName` is set, the JWT secret found in the deployment is used.
|
||||
|
||||
## Creating client authentication certificate keyfiles
|
||||
|
||||
The client authentication certificates needed for the `Secrets` identified by `spec.source.auth.keyfileSecretName` & `spec.destination.auth.keyfileSecretName`
|
||||
are normal ArangoDB keyfiles that can be created by the `arangosync create client-auth keyfile` command.
|
||||
In order to do so, you must have access to the client authentication CA of the source/destination.
|
||||
|
||||
If the client authentication CA at the source/destination also contains a private key (`ca.key`), the ArangoDeployment operator
|
||||
can be used to create such a keyfile for you, without the need to have `arangosync` installed locally.
|
||||
Read the following paragraphs for instructions on how to do that.
|
||||
|
||||
## Creating and using access packages
|
||||
|
||||
An access package is a YAML file that contains:
|
||||
|
||||
- A client authentication certificate, wrapped in a `Secret` in a `tls.keyfile` data field.
|
||||
- A TLS certificate authority public key, wrapped in a `Secret` in a `ca.crt` data field.
|
||||
|
||||
The format of the access package is such that it can be inserted into a Kubernetes cluster using the standard `kubectl` tool.
|
||||
|
||||
To create an access package that can be used to authenticate with the ArangoDB SyncMasters of an `ArangoDeployment`,
|
||||
add a name of a non-existing `Secret` to the `spec.sync.externalAccess.accessPackageSecretNames` field of the `ArangoDeployment`.
|
||||
In response, a `Secret` is created in that Kubernetes cluster, with the given name, that contains a `accessPackage.yaml` data field
|
||||
that contains a Kubernetes resource specification that can be inserted into the other Kubernetes cluster.
|
||||
|
||||
The process for creating and using an access package for authentication at the source cluster is as follows:
|
||||
|
||||
- Edit the `ArangoDeployment` resource of the source cluster, set `spec.sync.externalAccess.accessPackageSecretNames` to `["my-access-package"]`
|
||||
- Wait for the `ArangoDeployment` operator to create a `Secret` named `my-access-package`.
|
||||
- Extract the access package from the Kubernetes source cluster using:
|
||||
|
||||
```bash
|
||||
kubectl get secret my-access-package --template='{{index .data "accessPackage.yaml"}}' | base64 -D > accessPackage.yaml
|
||||
```
|
||||
|
||||
- Insert the secrets found in the access package in the Kubernetes destination cluster using:
|
||||
|
||||
```bash
|
||||
kubectl apply -f accessPackage.yaml
|
||||
```
|
||||
|
||||
As a result, the destination Kubernetes cluster will have 2 additional `Secrets`. One contains a client authentication certificate
|
||||
formatted as a keyfile. Another contains the public key of the TLS CA certificate of the source cluster.
|
840
docs/deployment-resource-reference.md
Normal file
840
docs/deployment-resource-reference.md
Normal file
|
@ -0,0 +1,840 @@
|
|||
# ArangoDeployment Custom Resource
|
||||
|
||||
The ArangoDB Deployment Operator creates and maintains ArangoDB deployments
|
||||
in a Kubernetes cluster, given a deployment specification.
|
||||
This deployment specification is a `CustomResource` following
|
||||
a `CustomResourceDefinition` created by the operator.
|
||||
|
||||
Example minimal deployment definition of an ArangoDB database cluster:
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "example-arangodb-cluster"
|
||||
spec:
|
||||
mode: Cluster
|
||||
```
|
||||
|
||||
Example more elaborate deployment definition:
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "example-arangodb-cluster"
|
||||
spec:
|
||||
mode: Cluster
|
||||
environment: Production
|
||||
agents:
|
||||
count: 3
|
||||
args:
|
||||
- --log.level=debug
|
||||
resources:
|
||||
requests:
|
||||
storage: 8Gi
|
||||
storageClassName: ssd
|
||||
dbservers:
|
||||
count: 5
|
||||
resources:
|
||||
requests:
|
||||
storage: 80Gi
|
||||
storageClassName: ssd
|
||||
coordinators:
|
||||
count: 3
|
||||
image: "arangodb/arangodb:3.9.3"
|
||||
```
|
||||
|
||||
## Specification reference
|
||||
|
||||
Below you'll find all settings of the `ArangoDeployment` custom resource.
|
||||
Several settings are for various groups of servers. These are indicated
|
||||
with `<group>` where `<group>` can be any of:
|
||||
|
||||
- `agents` for all Agents of a `Cluster` or `ActiveFailover` pair.
|
||||
- `dbservers` for all DB-Servers of a `Cluster`.
|
||||
- `coordinators` for all Coordinators of a `Cluster`.
|
||||
- `single` for all single servers of a `Single` instance or `ActiveFailover` pair.
|
||||
- `syncmasters` for all syncmasters of a `Cluster`.
|
||||
- `syncworkers` for all syncworkers of a `Cluster`.
|
||||
|
||||
Special group `id` can be used for image discovery and testing affinity/toleration settings.
|
||||
|
||||
### `spec.architecture: []string`
|
||||
|
||||
This setting specifies a CPU architecture for the deployment.
|
||||
Possible values are:
|
||||
|
||||
- `amd64` (default): Use processors with the x86-64 architecture.
|
||||
- `arm64`: Use processors with the 64-bit ARM architecture.
|
||||
|
||||
The setting expects a list of strings, but you should only specify a single
|
||||
list item for the architecture, except when you want to migrate from one
|
||||
architecture to the other. The first list item defines the new default
|
||||
architecture for the deployment that you want to migrate to.
|
||||
|
||||
_Tip:_
|
||||
To use the ARM architecture, you need to enable it in the operator first using
|
||||
`--set "operator.architectures={amd64,arm64}"`. See
|
||||
[Installation with Helm](using-the-operator.md#installation-with-helm).
|
||||
|
||||
To create a new deployment with `arm64` nodes, specify the architecture in the
|
||||
deployment specification as follows:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
architecture:
|
||||
- arm64
|
||||
```
|
||||
|
||||
To migrate nodes of an existing deployment from `amd64` to `arm64`, modify the
|
||||
deployment specification so that both architectures are listed:
|
||||
|
||||
```diff
|
||||
spec:
|
||||
architecture:
|
||||
+ - arm64
|
||||
- amd64
|
||||
```
|
||||
|
||||
This lets new members as well as recreated members use `arm64` nodes.
|
||||
|
||||
Then run the following command:
|
||||
|
||||
```bash
|
||||
kubectl annotate pod $POD "deployment.arangodb.com/replace=true"
|
||||
```
|
||||
|
||||
To change an existing member to `arm64`, annotate the pod as follows:
|
||||
|
||||
```bash
|
||||
kubectl annotate pod $POD "deployment.arangodb.com/arch=arm64"
|
||||
```
|
||||
|
||||
An `ArchitectureMismatch` condition occurs in the deployment:
|
||||
|
||||
```yaml
|
||||
members:
|
||||
single:
|
||||
- arango-version: 3.10.0
|
||||
architecture: arm64
|
||||
conditions:
|
||||
reason: Member has a different architecture than the deployment
|
||||
status: "True"
|
||||
type: ArchitectureMismatch
|
||||
```
|
||||
|
||||
Restart the pod using this command:
|
||||
|
||||
```bash
|
||||
kubectl annotate pod $POD "deployment.arangodb.com/rotate=true"
|
||||
```
|
||||
|
||||
### `spec.mode: string`
|
||||
|
||||
This setting specifies the type of deployment you want to create.
|
||||
Possible values are:
|
||||
|
||||
- `Cluster` (default) Full cluster. Defaults to 3 Agents, 3 DB-Servers & 3 Coordinators.
|
||||
- `ActiveFailover` Active-failover single pair. Defaults to 3 Agents and 2 single servers.
|
||||
- `Single` Single server only (note this does not provide high availability or reliability).
|
||||
|
||||
This setting cannot be changed after the deployment has been created.
|
||||
|
||||
### `spec.environment: string`
|
||||
|
||||
This setting specifies the type of environment in which the deployment is created.
|
||||
Possible values are:
|
||||
|
||||
- `Development` (default) This value optimizes the deployment for development
|
||||
use. It is possible to run a deployment on a small number of nodes (e.g. minikube).
|
||||
- `Production` This value optimizes the deployment for production use.
|
||||
It puts required affinity constraints on all pods to avoid Agents & DB-Servers
|
||||
from running on the same machine.
|
||||
|
||||
### `spec.image: string`
|
||||
|
||||
This setting specifies the docker image to use for all ArangoDB servers.
|
||||
In a `development` environment this setting defaults to `arangodb/arangodb:latest`.
|
||||
For `production` environments this is a required setting without a default value.
|
||||
It is highly recommend to use explicit version (not `latest`) for production
|
||||
environments.
|
||||
|
||||
### `spec.imagePullPolicy: string`
|
||||
|
||||
This setting specifies the pull policy for the docker image to use for all ArangoDB servers.
|
||||
Possible values are:
|
||||
|
||||
- `IfNotPresent` (default) to pull only when the image is not found on the node.
|
||||
- `Always` to always pull the image before using it.
|
||||
|
||||
### `spec.imagePullSecrets: []string`
|
||||
|
||||
This setting specifies the list of image pull secrets for the docker image to use for all ArangoDB servers.
|
||||
|
||||
### `spec.annotations: map[string]string`
|
||||
|
||||
This setting set specified annotations to all ArangoDeployment owned resources (pods, services, PVC's, PDB's).
|
||||
|
||||
### `spec.storageEngine: string`
|
||||
|
||||
This setting specifies the type of storage engine used for all servers
|
||||
in the cluster.
|
||||
Possible values are:
|
||||
|
||||
- `MMFiles` To use the MMFiles storage engine.
|
||||
- `RocksDB` (default) To use the RocksDB storage engine.
|
||||
|
||||
This setting cannot be changed after the cluster has been created.
|
||||
|
||||
### `spec.downtimeAllowed: bool`
|
||||
|
||||
This setting is used to allow automatic reconciliation actions that yield
|
||||
some downtime of the ArangoDB deployment.
|
||||
When this setting is set to `false` (the default), no automatic action that
|
||||
may result in downtime is allowed.
|
||||
If the need for such an action is detected, an event is added to the `ArangoDeployment`.
|
||||
|
||||
Once this setting is set to `true`, the automatic action is executed.
|
||||
|
||||
Operations that may result in downtime are:
|
||||
|
||||
- Rotating TLS CA certificate
|
||||
|
||||
Note: It is still possible that there is some downtime when the Kubernetes
|
||||
cluster is down, or in a bad state, irrespective of the value of this setting.
|
||||
|
||||
### `spec.memberPropagationMode`
|
||||
|
||||
Changes to a pod's configuration require a restart of that pod in almost all
|
||||
cases. Pods are restarted eagerly by default, which can cause more restarts than
|
||||
desired, especially when updating _arangod_ as well as the operator.
|
||||
The propagation of the configuration changes can be deferred to the next restart,
|
||||
either triggered manually by the user or by another operation like an upgrade.
|
||||
This reduces the number of restarts for upgrading both the server and the
|
||||
operator from two to one.
|
||||
|
||||
- `always`: Restart the member as soon as a configuration change is discovered
|
||||
- `on-restart`: Wait until the next restart to change the member configuration
|
||||
|
||||
### `spec.rocksdb.encryption.keySecretName`
|
||||
|
||||
This setting specifies the name of a Kubernetes `Secret` that contains
|
||||
an encryption key used for encrypting all data stored by ArangoDB servers.
|
||||
When an encryption key is used, encryption of the data in the cluster is enabled,
|
||||
without it encryption is disabled.
|
||||
The default value is empty.
|
||||
|
||||
This requires the Enterprise Edition.
|
||||
|
||||
The encryption key cannot be changed after the cluster has been created.
|
||||
|
||||
The secret specified by this setting, must have a data field named 'key' containing
|
||||
an encryption key that is exactly 32 bytes long.
|
||||
|
||||
### `spec.networkAttachedVolumes: bool`
|
||||
|
||||
The default of this option is `false`. If set to `true`, a `ResignLeaderShip`
|
||||
operation will be triggered when a DB-Server pod is evicted (rather than a
|
||||
`CleanOutServer` operation). Furthermore, the pod will simply be
|
||||
redeployed on a different node, rather than cleaned and retired and
|
||||
replaced by a new member. You must only set this option to `true` if
|
||||
your persistent volumes are "movable" in the sense that they can be
|
||||
mounted from a different k8s node, like in the case of network attached
|
||||
volumes. If your persistent volumes are tied to a specific pod, you
|
||||
must leave this option on `false`.
|
||||
|
||||
### `spec.externalAccess.type: string`
|
||||
|
||||
This setting specifies the type of `Service` that will be created to provide
|
||||
access to the ArangoDB deployment from outside the Kubernetes cluster.
|
||||
Possible values are:
|
||||
|
||||
- `None` To limit access to application running inside the Kubernetes cluster.
|
||||
- `LoadBalancer` To create a `Service` of type `LoadBalancer` for the ArangoDB deployment.
|
||||
- `NodePort` To create a `Service` of type `NodePort` for the ArangoDB deployment.
|
||||
- `Auto` (default) To create a `Service` of type `LoadBalancer` and fallback to a `Service` or type `NodePort` when the
|
||||
`LoadBalancer` is not assigned an IP address.
|
||||
|
||||
### `spec.externalAccess.loadBalancerIP: string`
|
||||
|
||||
This setting specifies the IP used to for the LoadBalancer to expose the ArangoDB deployment on.
|
||||
This setting is used when `spec.externalAccess.type` is set to `LoadBalancer` or `Auto`.
|
||||
|
||||
If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner.
|
||||
|
||||
### `spec.externalAccess.loadBalancerSourceRanges: []string`
|
||||
|
||||
If specified and supported by the platform (cloud provider), this will restrict traffic through the cloud-provider
|
||||
load-balancer will be restricted to the specified client IPs. This field will be ignored if the
|
||||
cloud-provider does not support the feature.
|
||||
|
||||
More info: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/
|
||||
|
||||
### `spec.externalAccess.nodePort: int`
|
||||
|
||||
This setting specifies the port used to expose the ArangoDB deployment on.
|
||||
This setting is used when `spec.externalAccess.type` is set to `NodePort` or `Auto`.
|
||||
|
||||
If you do not specify this setting, a random port will be chosen automatically.
|
||||
|
||||
### `spec.externalAccess.advertisedEndpoint: string`
|
||||
|
||||
This setting specifies the advertised endpoint for all Coordinators.
|
||||
|
||||
### `spec.auth.jwtSecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
the JWT token used for accessing all ArangoDB servers.
|
||||
When no name is specified, it defaults to `<deployment-name>-jwt`.
|
||||
To disable authentication, set this value to `None`.
|
||||
|
||||
If you specify a name of a `Secret`, that secret must have the token
|
||||
in a data field named `token`.
|
||||
|
||||
If you specify a name of a `Secret` that does not exist, a random token is created
|
||||
and stored in a `Secret` with given name.
|
||||
|
||||
Changing a JWT token results in stopping the entire cluster
|
||||
and restarting it.
|
||||
|
||||
### `spec.tls.caSecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
a standard CA certificate + private key used to sign certificates for individual
|
||||
ArangoDB servers.
|
||||
When no name is specified, it defaults to `<deployment-name>-ca`.
|
||||
To disable authentication, set this value to `None`.
|
||||
|
||||
If you specify a name of a `Secret` that does not exist, a self-signed CA certificate + key is created
|
||||
and stored in a `Secret` with given name.
|
||||
|
||||
The specified `Secret`, must contain the following data fields:
|
||||
|
||||
- `ca.crt` PEM encoded public key of the CA certificate
|
||||
- `ca.key` PEM encoded private key of the CA certificate
|
||||
|
||||
### `spec.tls.altNames: []string`
|
||||
|
||||
This setting specifies a list of alternate names that will be added to all generated
|
||||
certificates. These names can be DNS names or email addresses.
|
||||
The default value is empty.
|
||||
|
||||
### `spec.tls.ttl: duration`
|
||||
|
||||
This setting specifies the time to live of all generated
|
||||
server certificates.
|
||||
The default value is `2160h` (about 3 month).
|
||||
|
||||
When the server certificate is about to expire, it will be automatically replaced
|
||||
by a new one and the affected server will be restarted.
|
||||
|
||||
Note: The time to live of the CA certificate (when created automatically)
|
||||
will be set to 10 years.
|
||||
|
||||
### `spec.sync.enabled: bool`
|
||||
|
||||
This setting enables/disables support for data center 2 data center
|
||||
replication in the cluster. When enabled, the cluster will contain
|
||||
a number of `syncmaster` & `syncworker` servers.
|
||||
The default value is `false`.
|
||||
|
||||
### `spec.sync.externalAccess.type: string`
|
||||
|
||||
This setting specifies the type of `Service` that will be created to provide
|
||||
access to the ArangoSync syncMasters from outside the Kubernetes cluster.
|
||||
Possible values are:
|
||||
|
||||
- `None` To limit access to applications running inside the Kubernetes cluster.
|
||||
- `LoadBalancer` To create a `Service` of type `LoadBalancer` for the ArangoSync SyncMasters.
|
||||
- `NodePort` To create a `Service` of type `NodePort` for the ArangoSync SyncMasters.
|
||||
- `Auto` (default) To create a `Service` of type `LoadBalancer` and fallback to a `Service` or type `NodePort` when the
|
||||
`LoadBalancer` is not assigned an IP address.
|
||||
|
||||
Note that when you specify a value of `None`, a `Service` will still be created, but of type `ClusterIP`.
|
||||
|
||||
### `spec.sync.externalAccess.loadBalancerIP: string`
|
||||
|
||||
This setting specifies the IP used for the LoadBalancer to expose the ArangoSync SyncMasters on.
|
||||
This setting is used when `spec.sync.externalAccess.type` is set to `LoadBalancer` or `Auto`.
|
||||
|
||||
If you do not specify this setting, an IP will be chosen automatically by the load-balancer provisioner.
|
||||
|
||||
### `spec.sync.externalAccess.nodePort: int`
|
||||
|
||||
This setting specifies the port used to expose the ArangoSync SyncMasters on.
|
||||
This setting is used when `spec.sync.externalAccess.type` is set to `NodePort` or `Auto`.
|
||||
|
||||
If you do not specify this setting, a random port will be chosen automatically.
|
||||
|
||||
### `spec.sync.externalAccess.loadBalancerSourceRanges: []string`
|
||||
|
||||
If specified and supported by the platform (cloud provider), this will restrict traffic through the cloud-provider
|
||||
load-balancer will be restricted to the specified client IPs. This field will be ignored if the
|
||||
cloud-provider does not support the feature.
|
||||
|
||||
More info: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/
|
||||
|
||||
### `spec.sync.externalAccess.masterEndpoint: []string`
|
||||
|
||||
This setting specifies the master endpoint(s) advertised by the ArangoSync SyncMasters.
|
||||
If not set, this setting defaults to:
|
||||
|
||||
- If `spec.sync.externalAccess.loadBalancerIP` is set, it defaults to `https://<load-balancer-ip>:<8629>`.
|
||||
- Otherwise it defaults to `https://<sync-service-dns-name>:<8629>`.
|
||||
|
||||
### `spec.sync.externalAccess.accessPackageSecretNames: []string`
|
||||
|
||||
This setting specifies the names of zero of more `Secrets` that will be created by the deployment
|
||||
operator containing "access packages". An access package contains those `Secrets` that are needed
|
||||
to access the SyncMasters of this `ArangoDeployment`.
|
||||
|
||||
By removing a name from this setting, the corresponding `Secret` is also deleted.
|
||||
Note that to remove all access packages, leave an empty array in place (`[]`).
|
||||
Completely removing the setting results in not modifying the list.
|
||||
|
||||
See [the `ArangoDeploymentReplication` specification](deployment-replication-resource-reference.md) for more information
|
||||
on access packages.
|
||||
|
||||
### `spec.sync.auth.jwtSecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
the JWT token used for accessing all ArangoSync master servers.
|
||||
When not specified, the `spec.auth.jwtSecretName` value is used.
|
||||
|
||||
If you specify a name of a `Secret` that does not exist, a random token is created
|
||||
and stored in a `Secret` with given name.
|
||||
|
||||
### `spec.sync.auth.clientCASecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
a PEM encoded CA certificate used for client certificate verification
|
||||
in all ArangoSync master servers.
|
||||
This is a required setting when `spec.sync.enabled` is `true`.
|
||||
The default value is empty.
|
||||
|
||||
### `spec.sync.mq.type: string`
|
||||
|
||||
This setting sets the type of message queue used by ArangoSync.
|
||||
Possible values are:
|
||||
|
||||
- `Direct` (default) for direct HTTP connections between the 2 data centers.
|
||||
|
||||
### `spec.sync.tls.caSecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
a standard CA certificate + private key used to sign certificates for individual
|
||||
ArangoSync master servers.
|
||||
|
||||
When no name is specified, it defaults to `<deployment-name>-sync-ca`.
|
||||
|
||||
If you specify a name of a `Secret` that does not exist, a self-signed CA certificate + key is created
|
||||
and stored in a `Secret` with given name.
|
||||
|
||||
The specified `Secret`, must contain the following data fields:
|
||||
|
||||
- `ca.crt` PEM encoded public key of the CA certificate
|
||||
- `ca.key` PEM encoded private key of the CA certificate
|
||||
|
||||
### `spec.sync.tls.altNames: []string`
|
||||
|
||||
This setting specifies a list of alternate names that will be added to all generated
|
||||
certificates. These names can be DNS names or email addresses.
|
||||
The default value is empty.
|
||||
|
||||
### `spec.sync.monitoring.tokenSecretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
the bearer token used for accessing all monitoring endpoints of all ArangoSync
|
||||
servers.
|
||||
When not specified, no monitoring token is used.
|
||||
The default value is empty.
|
||||
|
||||
### `spec.disableIPv6: bool`
|
||||
|
||||
This setting prevents the use of IPv6 addresses by ArangoDB servers.
|
||||
The default is `false`.
|
||||
|
||||
This setting cannot be changed after the deployment has been created.
|
||||
|
||||
### `spec.restoreFrom: string`
|
||||
|
||||
This setting specifies a `ArangoBackup` resource name the cluster should be restored from.
|
||||
|
||||
After a restore or failure to do so, the status of the deployment contains information about the
|
||||
restore operation in the `restore` key.
|
||||
|
||||
It will contain some of the following fields:
|
||||
- _requestedFrom_: name of the `ArangoBackup` used to restore from.
|
||||
- _message_: optional message explaining why the restore failed.
|
||||
- _state_: state indicating if the restore was successful or not. Possible values: `Restoring`, `Restored`, `RestoreFailed`
|
||||
|
||||
If the `restoreFrom` key is removed from the spec, the `restore` key is deleted as well.
|
||||
|
||||
A new restore attempt is made if and only if either in the status restore is not set or if spec.restoreFrom and status.requestedFrom are different.
|
||||
|
||||
### `spec.license.secretName: string`
|
||||
|
||||
This setting specifies the name of a kubernetes `Secret` that contains
|
||||
the license key token used for enterprise images. This value is not used for
|
||||
the Community Edition.
|
||||
|
||||
### `spec.bootstrap.passwordSecretNames.root: string`
|
||||
|
||||
This setting specifies a secret name for the credentials of the root user.
|
||||
|
||||
When a deployment is created the operator will setup the root user account
|
||||
according to the credentials given by the secret. If the secret doesn't exist
|
||||
the operator creates a secret with a random password.
|
||||
|
||||
There are two magic values for the secret name:
|
||||
- `None` specifies no action. This disables root password randomization. This is the default value. (Thus the root password is empty - not recommended)
|
||||
- `Auto` specifies automatic name generation, which is `<deploymentname>-root-password`.
|
||||
|
||||
### `spec.metrics.enabled: bool`
|
||||
|
||||
If this is set to `true`, the operator runs a sidecar container for
|
||||
every Agent, DB-Server, Coordinator and Single server.
|
||||
|
||||
In addition to the sidecar containers the operator will deploy a service
|
||||
to access the exporter ports (from within the k8s cluster), and a
|
||||
resource of type `ServiceMonitor`, provided the corresponding custom
|
||||
resource definition is deployed in the k8s cluster. If you are running
|
||||
Prometheus in the same k8s cluster with the Prometheus operator, this
|
||||
will be the case. The `ServiceMonitor` will have the following labels
|
||||
set:
|
||||
|
||||
- `app: arangodb`
|
||||
- `arango_deployment: YOUR_DEPLOYMENT_NAME`
|
||||
- `context: metrics`
|
||||
- `metrics: prometheus`
|
||||
|
||||
This makes it possible that you configure your Prometheus deployment to
|
||||
automatically start monitoring on the available Prometheus feeds. To
|
||||
this end, you must configure the `serviceMonitorSelector` in the specs
|
||||
of your Prometheus deployment to match these labels. For example:
|
||||
|
||||
```yaml
|
||||
serviceMonitorSelector:
|
||||
matchLabels:
|
||||
metrics: prometheus
|
||||
```
|
||||
|
||||
would automatically select all pods of all ArangoDB cluster deployments
|
||||
which have metrics enabled.
|
||||
|
||||
### `spec.metrics.image: string`
|
||||
|
||||
<small>Deprecated in: v1.2.0 (kube-arangodb)</small>
|
||||
|
||||
See above, this is the name of the Docker image for the ArangoDB
|
||||
exporter to expose metrics. If empty, the same image as for the main
|
||||
deployment is used.
|
||||
|
||||
### `spec.metrics.resources: ResourceRequirements`
|
||||
|
||||
<small>Introduced in: v0.4.3 (kube-arangodb)</small>
|
||||
|
||||
This setting specifies the resources required by the metrics container.
|
||||
This includes requests and limits.
|
||||
See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).
|
||||
|
||||
### `spec.metrics.mode: string`
|
||||
|
||||
<small>Introduced in: v1.0.2 (kube-arangodb)</small>
|
||||
|
||||
Defines metrics exporter mode.
|
||||
|
||||
Possible values:
|
||||
- `exporter` (default): add sidecar to pods (except Agency pods) and exposes
|
||||
metrics collected by exporter from ArangoDB Container. Exporter in this mode
|
||||
exposes metrics which are accessible without authentication.
|
||||
- `sidecar`: add sidecar to all pods and expose metrics from ArangoDB metrics
|
||||
endpoint. Exporter in this mode exposes metrics which are accessible without
|
||||
authentication.
|
||||
- `internal`: configure ServiceMonitor to use internal ArangoDB metrics endpoint
|
||||
(proper JWT token is generated for this endpoint).
|
||||
|
||||
### `spec.metrics.tls: bool`
|
||||
|
||||
<small>Introduced in: v1.1.0 (kube-arangodb)</small>
|
||||
|
||||
Defines if TLS should be enabled on Metrics exporter endpoint.
|
||||
The default is `true`.
|
||||
|
||||
This option will enable TLS only if TLS is enabled on ArangoDeployment,
|
||||
otherwise `true` value will not take any effect.
|
||||
|
||||
### `spec.lifecycle.resources: ResourceRequirements`
|
||||
|
||||
<small>Introduced in: v0.4.3 (kube-arangodb)</small>
|
||||
|
||||
This setting specifies the resources required by the lifecycle init container.
|
||||
This includes requests and limits.
|
||||
See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container).
|
||||
|
||||
### `spec.<group>.count: number`
|
||||
|
||||
This setting specifies the number of servers to start for the given group.
|
||||
For the Agent group, this value must be a positive, odd number.
|
||||
The default value is `3` for all groups except `single` (there the default is `1`
|
||||
for `spec.mode: Single` and `2` for `spec.mode: ActiveFailover`).
|
||||
|
||||
For the `syncworkers` group, it is highly recommended to use the same number
|
||||
as for the `dbservers` group.
|
||||
|
||||
### `spec.<group>.minCount: number`
|
||||
|
||||
Specifies a minimum for the count of servers. If set, a specification is invalid if `count < minCount`.
|
||||
|
||||
### `spec.<group>.maxCount: number`
|
||||
|
||||
Specifies a maximum for the count of servers. If set, a specification is invalid if `count > maxCount`.
|
||||
|
||||
### `spec.<group>.args: []string`
|
||||
|
||||
This setting specifies additional command-line arguments passed to all servers of this group.
|
||||
The default value is an empty array.
|
||||
|
||||
### `spec.<group>.resources: ResourceRequirements`
|
||||
|
||||
This setting specifies the resources required by pods of this group. This includes requests and limits.
|
||||
|
||||
See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ for details.
|
||||
|
||||
### `spec.<group>.overrideDetectedTotalMemory: bool`
|
||||
|
||||
<small>Introduced in: v1.0.1 (kube-arangodb)</small>
|
||||
|
||||
Set additional flag in ArangoDeployment pods to propagate Memory resource limits
|
||||
|
||||
### `spec.<group>.volumeClaimTemplate.Spec: PersistentVolumeClaimSpec`
|
||||
|
||||
Specifies a volumeClaimTemplate used by operator to create to volume claims for pods of this group.
|
||||
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`.
|
||||
|
||||
The default value describes a volume with `8Gi` storage, `ReadWriteOnce` access mode and volume mode set to `PersistentVolumeFilesystem`.
|
||||
|
||||
If this field is not set and `spec.<group>.resources.requests.storage` is set, then a default volume claim
|
||||
with size as specified by `spec.<group>.resources.requests.storage` will be created. In that case `storage`
|
||||
and `iops` is not forwarded to the pods resource requirements.
|
||||
|
||||
### `spec.<group>.pvcResizeMode: string`
|
||||
|
||||
Specifies a resize mode used by operator to resize PVCs and PVs.
|
||||
|
||||
Supported modes:
|
||||
- runtime (default) - PVC will be resized in Pod runtime (EKS, GKE)
|
||||
- rotate - Pod will be shutdown and PVC will be resized (AKS)
|
||||
|
||||
### `spec.<group>.serviceAccountName: string`
|
||||
|
||||
This setting specifies the `serviceAccountName` for the `Pods` created
|
||||
for each server of this group. If empty, it defaults to using the
|
||||
`default` service account.
|
||||
|
||||
Using an alternative `ServiceAccount` is typically used to separate access rights.
|
||||
The ArangoDB deployments need some very minimal access rights. With the
|
||||
deployment of the operator, we grant the following rights for the `default`
|
||||
service account:
|
||||
|
||||
```yaml
|
||||
rules:
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- pods
|
||||
verbs:
|
||||
- get
|
||||
```
|
||||
|
||||
If you are using a different service account, please grant these rights
|
||||
to that service account.
|
||||
|
||||
### `spec.<group>.annotations: map[string]string`
|
||||
|
||||
This setting set annotations overrides for pods in this group. Annotations are merged with `spec.annotations`.
|
||||
|
||||
### `spec.<group>.priorityClassName: string`
|
||||
|
||||
Priority class name for pods of this group. Will be forwarded to the pod spec. [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/)
|
||||
|
||||
### `spec.<group>.probes.livenessProbeDisabled: bool`
|
||||
|
||||
If set to true, the operator does not generate a liveness probe for new pods belonging to this group.
|
||||
|
||||
### `spec.<group>.probes.livenessProbeSpec.initialDelaySeconds: int`
|
||||
|
||||
Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 2 seconds. Minimum value is 0.
|
||||
|
||||
### `spec.<group>.probes.livenessProbeSpec.periodSeconds: int`
|
||||
|
||||
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.livenessProbeSpec.timeoutSeconds: int`
|
||||
|
||||
Number of seconds after which the probe times out. Defaults to 2 second. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.livenessProbeSpec.failureThreshold: int`
|
||||
|
||||
When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up.
|
||||
Giving up means restarting the container. Defaults to 3. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeDisabled: bool`
|
||||
|
||||
If set to true, the operator does not generate a readiness probe for new pods belonging to this group.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeSpec.initialDelaySeconds: int`
|
||||
|
||||
Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 2 seconds. Minimum value is 0.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeSpec.periodSeconds: int`
|
||||
|
||||
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeSpec.timeoutSeconds: int`
|
||||
|
||||
Number of seconds after which the probe times out. Defaults to 2 second. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeSpec.successThreshold: int`
|
||||
|
||||
Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.probes.readinessProbeSpec.failureThreshold: int`
|
||||
|
||||
When a Pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up.
|
||||
Giving up means the Pod will be marked Unready. Defaults to 3. Minimum value is 1.
|
||||
|
||||
### `spec.<group>.allowMemberRecreation: bool`
|
||||
|
||||
<small>Introduced in: v1.2.1 (kube-arangodb)</small>
|
||||
|
||||
This setting changes the member recreation logic based on group:
|
||||
- For Sync Masters, Sync Workers, Coordinator and DB-Servers it determines if a member can be recreated in case of failure (default `true`)
|
||||
- For Agents and Single this value is hardcoded to `false` and the value provided in spec is ignored.
|
||||
|
||||
### `spec.<group>.tolerations: []Toleration`
|
||||
|
||||
This setting specifies the `tolerations` for the `Pod`s created
|
||||
for each server of this group.
|
||||
|
||||
By default, suitable tolerations are set for the following keys with the `NoExecute` effect:
|
||||
|
||||
- `node.kubernetes.io/not-ready`
|
||||
- `node.kubernetes.io/unreachable`
|
||||
- `node.alpha.kubernetes.io/unreachable` (will be removed in future version)
|
||||
|
||||
For more information on tolerations, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/).
|
||||
|
||||
### `spec.<group>.nodeSelector: map[string]string`
|
||||
|
||||
This setting specifies a set of labels to be used as `nodeSelector` for Pods of this node.
|
||||
|
||||
For more information on node selectors, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/).
|
||||
|
||||
### `spec.<group>.entrypoint: string`
|
||||
Entrypoint overrides container executable.
|
||||
|
||||
### `spec.<group>.antiAffinity: PodAntiAffinity`
|
||||
Specifies additional `antiAffinity` settings in ArangoDB Pod definitions.
|
||||
|
||||
For more information on `antiAffinity`, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).
|
||||
|
||||
### `spec.<group>.affinity: PodAffinity`
|
||||
Specifies additional `affinity` settings in ArangoDB Pod definitions.
|
||||
|
||||
For more information on `affinity`, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).
|
||||
|
||||
### `spec.<group>.nodeAffinity: NodeAffinity`
|
||||
Specifies additional `nodeAffinity` settings in ArangoDB Pod definitions.
|
||||
|
||||
For more information on `nodeAffinity`, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).
|
||||
|
||||
### `spec.<group>.securityContext: ServerGroupSpecSecurityContext`
|
||||
Specifies additional `securityContext` settings in ArangoDB Pod definitions.
|
||||
This is similar (but not fully compatible) to k8s SecurityContext definition.
|
||||
|
||||
For more information on `securityContext`, consult the
|
||||
[Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).
|
||||
|
||||
### `spec.<group>.securityContext.addCapabilities: []Capability`
|
||||
Adds new capabilities to containers.
|
||||
|
||||
### `spec.<group>.securityContext.allowPrivilegeEscalation: bool`
|
||||
Controls whether a process can gain more privileges than its parent process.
|
||||
|
||||
### `spec.<group>.securityContext.privileged: bool`
|
||||
Runs container in privileged mode. Processes in privileged containers are
|
||||
essentially equivalent to root on the host.
|
||||
|
||||
### `spec.<group>.securityContext.readOnlyRootFilesystem: bool`
|
||||
Mounts the container's root filesystem as read-only.
|
||||
|
||||
### `spec.<group>.securityContext.runAsNonRoot: bool`
|
||||
Indicates that the container must run as a non-root user.
|
||||
|
||||
### `spec.<group>.securityContext.runAsUser: integer`
|
||||
The UID to run the entrypoint of the container process.
|
||||
|
||||
### `spec.<group>.securityContext.runAsGroup: integer`
|
||||
The GID to run the entrypoint of the container process.
|
||||
|
||||
### `spec.<group>.securityContext.supplementalGroups: []integer`
|
||||
A list of groups applied to the first process run in each container, in addition to the container's primary GID,
|
||||
the fsGroup (if specified), and group memberships defined in the container image for the uid of the container process.
|
||||
|
||||
### `spec.<group>.securityContext.fsGroup: integer`
|
||||
A special supplemental group that applies to all containers in a pod.
|
||||
|
||||
### `spec.<group>.securityContext.seccompProfile: SeccompProfile`
|
||||
The seccomp options to use by the containers in this pod.
|
||||
|
||||
### `spec.<group>.securityContext.seLinuxOptions: SELinuxOptions`
|
||||
The SELinux context to be applied to all containers.
|
||||
|
||||
## Image discovery group `spec.id` fields
|
||||
|
||||
Image discovery (`id`) group supports only next subset of fields.
|
||||
Refer to according field documentation in `spec.<group>` description.
|
||||
|
||||
- `spec.id.entrypoint: string`
|
||||
- `spec.id.tolerations: []Toleration`
|
||||
- `spec.id.nodeSelector: map[string]string`
|
||||
- `spec.id.priorityClassName: string`
|
||||
- `spec.id.antiAffinity: PodAntiAffinity`
|
||||
- `spec.id.affinity: PodAffinity`
|
||||
- `spec.id.nodeAffinity: NodeAffinity`
|
||||
- `spec.id.serviceAccountName: string`
|
||||
- `spec.id.securityContext: ServerGroupSpecSecurityContext`
|
||||
- `spec.id.resources: ResourceRequirements`
|
||||
|
||||
## Deprecated Fields
|
||||
|
||||
### `spec.<group>.resources.requests.storage: storageUnit`
|
||||
|
||||
This setting specifies the amount of storage required for each server of this group.
|
||||
The default value is `8Gi`.
|
||||
|
||||
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`
|
||||
because servers in these groups do not need persistent storage.
|
||||
|
||||
Please use VolumeClaimTemplate from now on. This field is not considered if
|
||||
VolumeClaimTemplate is set. Note however, that the information in requests
|
||||
is completely handed over to the pod in this case.
|
||||
|
||||
### `spec.<group>.storageClassName: string`
|
||||
|
||||
This setting specifies the `storageClass` for the `PersistentVolume`s created
|
||||
for each server of this group.
|
||||
|
||||
This setting is not available for group `coordinators`, `syncmasters` & `syncworkers`
|
||||
because servers in these groups do not need persistent storage.
|
||||
|
||||
Please use VolumeClaimTemplate from now on. This field is not considered if
|
||||
VolumeClaimTemplate is set. Note however, that the information in requests
|
||||
is completely handed over to the pod in this case.
|
|
@ -1,4 +1,4 @@
|
|||
# ArangoDB operator architecture details
|
||||
# ArangoDB operator architecture overview
|
||||
|
||||
- [Operator API](./api.md)
|
||||
- [Backups](./backup.md)
|
||||
|
@ -9,5 +9,4 @@
|
|||
- [Pod eviction and replacement](./pod_eviction_and_replacement.md)
|
||||
- [Kubernetes Pod name versus cluster ID](./pod_name_versus_cluster_id.md)
|
||||
- [Resources & labels](./resources_and_labels.md)
|
||||
- [Scaling](./scaling.md)
|
||||
- [Topology awareness](./topology_awareness.md)
|
|
@ -1,21 +0,0 @@
|
|||
# Scaling
|
||||
|
||||
Number of running servers is controlled through `spec.<server_group>.count` field.
|
||||
|
||||
### Scale-up
|
||||
When increasing the `count`, operator will try to create missing pods.
|
||||
When scaling up make sure that you have enough computational resources / nodes, otherwise pod will stuck in Pending state.
|
||||
|
||||
|
||||
### Scale-down
|
||||
|
||||
Scaling down is always done 1 server at a time.
|
||||
|
||||
Scale down is possible only when all other actions on ArangoDeployment are finished.
|
||||
|
||||
The internal process followed by the ArangoDB operator when scaling up is as follows:
|
||||
- It chooses a member to be evicted. First it will try to remove unhealthy members or fall-back to the member with highest deletion_priority.
|
||||
- Making an internal calls, it forces the server to resign leadership.
|
||||
In case of DB servers it means that all shard leaders will be switched to other servers.
|
||||
- Wait until server is cleaned out from cluster
|
||||
- Pod finalized
|
453
docs/draining-nodes.md
Normal file
453
docs/draining-nodes.md
Normal file
|
@ -0,0 +1,453 @@
|
|||
# Draining Kubernetes nodes
|
||||
|
||||
**If Kubernetes nodes with ArangoDB pods on them are drained without care
|
||||
data loss can occur!**
|
||||
The recommended procedure is described below.
|
||||
|
||||
For maintenance work in k8s it is sometimes necessary to drain a k8s node,
|
||||
which means removing all pods from it. Kubernetes offers a standard API
|
||||
for this and our operator supports this - to the best of its ability.
|
||||
|
||||
Draining nodes is easy enough for stateless services, which can simply be
|
||||
re-launched on any other node. However, for a stateful service this
|
||||
operation is more difficult, and as a consequence more costly and there
|
||||
are certain risks involved, if the operation is not done carefully
|
||||
enough. To put it simply, the operator must first move all the data
|
||||
stored on the node (which could be in a locally attached disk) to
|
||||
another machine, before it can shut down the pod gracefully. Moving data
|
||||
takes time, and even after the move, the distributed system ArangoDB has
|
||||
to recover from this change, for example by ensuring data synchronicity
|
||||
between the replicas in their new location.
|
||||
|
||||
Therefore, a systematic drain of all k8s nodes in sequence has to follow
|
||||
a careful procedure, in particular to ensure that ArangoDB is ready to
|
||||
move to the next step. This is necessary to avoid catastrophic data
|
||||
loss, and is simply the price one pays for running a stateful service.
|
||||
|
||||
## Anatomy of a drain procedure in k8s: the grace period
|
||||
|
||||
When a `kubectl drain` operation is triggered for a node, k8s first
|
||||
checks if there are any pods with local data on disk. Our ArangoDB pods have
|
||||
this property (the _Coordinators_ do use `EmptyDir` volumes, and _Agents_
|
||||
and _DB-Servers_ could have persistent volumes which are actually stored on
|
||||
a locally attached disk), so one has to override this with the
|
||||
`--delete-local-data=true` option.
|
||||
|
||||
Furthermore, quite often, the node will contain pods which are managed
|
||||
by a `DaemonSet` (which is not the case for ArangoDB), which makes it
|
||||
necessary to override this check with the `--ignore-daemonsets=true`
|
||||
option.
|
||||
|
||||
Finally, it is checked if the node has any pods which are not managed by
|
||||
anything, either by k8s itself (`ReplicationController`, `ReplicaSet`,
|
||||
`Job`, `DaemonSet` or `StatefulSet`) or by an operator. If this is the
|
||||
case, the drain operation will be refused, unless one uses the option
|
||||
`--force=true`. Since the ArangoDB operator manages our pods, we do not
|
||||
have to use this option for ArangoDB, but you might have to use it for
|
||||
other pods.
|
||||
|
||||
If all these checks have been overcome, k8s proceeds as follows: All
|
||||
pods are notified about this event and are put into a `Terminating`
|
||||
state. During this time, they have a chance to take action, or indeed
|
||||
the operator managing them has. In particular, although the pods get
|
||||
termination notices, they can keep running until the operator has
|
||||
removed all _finalizers_. This gives the operator a chance to sort out
|
||||
things, for example in our case to move data away from the pod.
|
||||
|
||||
However, there is a limit to this tolerance by k8s, and that is the
|
||||
grace period. If the grace period has passed but the pod has not
|
||||
actually terminated, then it is killed the hard way. If this happens,
|
||||
the operator has no chance but to remove the pod, drop its persistent
|
||||
volume claim and persistent volume. This will obviously lead to a
|
||||
failure incident in ArangoDB and must be handled by fail-over management.
|
||||
Therefore, **this event should be avoided**.
|
||||
|
||||
## Things to check in ArangoDB before a node drain
|
||||
|
||||
There are basically two things one should check in an ArangoDB cluster
|
||||
before a node drain operation can be started:
|
||||
|
||||
1. All cluster nodes are up and running and healthy.
|
||||
2. For all collections and shards all configured replicas are in sync.
|
||||
|
||||
#### Attention:
|
||||
1) If any cluster node is unhealthy, there is an increased risk that the
|
||||
system does not have enough resources to cope with a failure situation.
|
||||
2) If any shard replicas are not currently in sync, then there is a serious
|
||||
risk that the cluster is currently not as resilient as expected.
|
||||
|
||||
One possibility to verify these two things is via the ArangoDB web interface.
|
||||
Node health can be monitored in the _Overview_ tab under _NODES_:
|
||||
|
||||
![Cluster Health Screen](images/HealthyCluster.png)
|
||||
|
||||
**Check that all nodes are green** and that there is **no node error** in the
|
||||
top right corner.
|
||||
|
||||
As to the shards being in sync, see the _Shards_ tab under _NODES_:
|
||||
|
||||
![Shard Screen](images/ShardsInSync.png)
|
||||
|
||||
**Check that all collections have a green check mark** on the right side.
|
||||
If any collection does not have such a check mark, you can click on the
|
||||
collection and see the details about shards. Please keep in
|
||||
mind that this has to be done **for each database** separately!
|
||||
|
||||
Obviously, this might be tedious and calls for automation. Therefore, there
|
||||
are APIs for this. The first one is [Cluster Health](https://docs.arangodb.com/stable/develop/http/cluster/#get-the-cluster-health):
|
||||
|
||||
```
|
||||
POST /_admin/cluster/health
|
||||
```
|
||||
|
||||
… which returns a JSON document looking like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"Health": {
|
||||
"CRDN-rxtu5pku": {
|
||||
"Endpoint": "ssl://my-arangodb-cluster-coordinator-rxtu5pku.my-arangodb-cluster-int.default.svc:8529",
|
||||
"LastAckedTime": "2019-02-20T08:09:22Z",
|
||||
"SyncTime": "2019-02-20T08:09:21Z",
|
||||
"Version": "3.4.2-1",
|
||||
"Engine": "rocksdb",
|
||||
"ShortName": "Coordinator0002",
|
||||
"Timestamp": "2019-02-20T08:09:22Z",
|
||||
"Status": "GOOD",
|
||||
"SyncStatus": "SERVING",
|
||||
"Host": "my-arangodb-cluster-coordinator-rxtu5pku.my-arangodb-cluster-int.default.svc",
|
||||
"Role": "Coordinator",
|
||||
"CanBeDeleted": false
|
||||
},
|
||||
"PRMR-wbsq47rz": {
|
||||
"LastAckedTime": "2019-02-21T09:14:24Z",
|
||||
"Endpoint": "ssl://my-arangodb-cluster-dbserver-wbsq47rz.my-arangodb-cluster-int.default.svc:8529",
|
||||
"SyncTime": "2019-02-21T09:14:24Z",
|
||||
"Version": "3.4.2-1",
|
||||
"Host": "my-arangodb-cluster-dbserver-wbsq47rz.my-arangodb-cluster-int.default.svc",
|
||||
"Timestamp": "2019-02-21T09:14:24Z",
|
||||
"Status": "GOOD",
|
||||
"SyncStatus": "SERVING",
|
||||
"Engine": "rocksdb",
|
||||
"ShortName": "DBServer0006",
|
||||
"Role": "DBServer",
|
||||
"CanBeDeleted": false
|
||||
},
|
||||
"AGNT-wrqmwpuw": {
|
||||
"Endpoint": "ssl://my-arangodb-cluster-agent-wrqmwpuw.my-arangodb-cluster-int.default.svc:8529",
|
||||
"Role": "Agent",
|
||||
"CanBeDeleted": false,
|
||||
"Version": "3.4.2-1",
|
||||
"Engine": "rocksdb",
|
||||
"Leader": "AGNT-oqohp3od",
|
||||
"Status": "GOOD",
|
||||
"LastAckedTime": 0.312
|
||||
},
|
||||
... [some more entries, one for each instance]
|
||||
},
|
||||
"ClusterId": "210a0536-fd28-46de-b77f-e8882d6d7078",
|
||||
"error": false,
|
||||
"code": 200
|
||||
}
|
||||
```
|
||||
|
||||
Check that each instance has a `Status` field with the value `"GOOD"`.
|
||||
Here is a shell command which makes this check easy, using the
|
||||
[`jq` JSON pretty printer](https://stedolan.github.io/jq/):
|
||||
|
||||
```bash
|
||||
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/health --user root: | jq . | grep '"Status"' | grep -v '"GOOD"'
|
||||
```
|
||||
|
||||
For the shards being in sync there is the
|
||||
[Cluster Inventory](https://docs.arangodb.com/stable/develop/http/replication/replication-dump#get-the-cluster-collections-and-indexes)
|
||||
|
||||
API call:
|
||||
|
||||
```
|
||||
POST /_db/_system/_api/replication/clusterInventory
|
||||
```
|
||||
|
||||
… which returns a JSON body like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"collections": [
|
||||
{
|
||||
"parameters": {
|
||||
"cacheEnabled": false,
|
||||
"deleted": false,
|
||||
"globallyUniqueId": "c2010061/",
|
||||
"id": "2010061",
|
||||
"isSmart": false,
|
||||
"isSystem": false,
|
||||
"keyOptions": {
|
||||
"allowUserKeys": true,
|
||||
"type": "traditional"
|
||||
},
|
||||
"name": "c",
|
||||
"numberOfShards": 6,
|
||||
"planId": "2010061",
|
||||
"replicationFactor": 2,
|
||||
"shardKeys": [
|
||||
"_key"
|
||||
],
|
||||
"shardingStrategy": "hash",
|
||||
"shards": {
|
||||
"s2010066": [
|
||||
"PRMR-vzeebvwf",
|
||||
"PRMR-e6hbjob1"
|
||||
],
|
||||
"s2010062": [
|
||||
"PRMR-e6hbjob1",
|
||||
"PRMR-vzeebvwf"
|
||||
],
|
||||
"s2010065": [
|
||||
"PRMR-e6hbjob1",
|
||||
"PRMR-vzeebvwf"
|
||||
],
|
||||
"s2010067": [
|
||||
"PRMR-vzeebvwf",
|
||||
"PRMR-e6hbjob1"
|
||||
],
|
||||
"s2010064": [
|
||||
"PRMR-vzeebvwf",
|
||||
"PRMR-e6hbjob1"
|
||||
],
|
||||
"s2010063": [
|
||||
"PRMR-e6hbjob1",
|
||||
"PRMR-vzeebvwf"
|
||||
]
|
||||
},
|
||||
"status": 3,
|
||||
"type": 2,
|
||||
"waitForSync": false
|
||||
},
|
||||
"indexes": [],
|
||||
"planVersion": 132,
|
||||
"isReady": true,
|
||||
"allInSync": true
|
||||
},
|
||||
... [more collections following]
|
||||
],
|
||||
"views": [],
|
||||
"tick": "38139421",
|
||||
"state": "unused"
|
||||
}
|
||||
```
|
||||
|
||||
Check that for all collections the attributes `"isReady"` and `"allInSync"`
|
||||
both have the value `true`. Note that it is necessary to do this for all
|
||||
databases!
|
||||
|
||||
Here is a shell command which makes this check easy:
|
||||
|
||||
```bash
|
||||
curl -k https://arangodb.9hoeffer.de:8529/_db/_system/_api/replication/clusterInventory --user root: | jq . | grep '"isReady"\|"allInSync"' | sort | uniq -c
|
||||
```
|
||||
|
||||
If all these checks are performed and are okay, then it is safe to
|
||||
continue with the clean out and drain procedure as described below.
|
||||
|
||||
|
||||
#### Attention:
|
||||
If there are some collections with `replicationFactor` set to
|
||||
1, the system is not resilient and cannot tolerate the failure of even a
|
||||
single server! One can still perform a drain operation in this case, but
|
||||
if anything goes wrong, in particular if the grace period is chosen too
|
||||
short and a pod is killed the hard way, data loss can happen.
|
||||
|
||||
If all `replicationFactor`s of all collections are at least 2, then the
|
||||
system can tolerate the failure of a single _DB-Server_. If you have set
|
||||
the `Environment` to `Production` in the specs of the ArangoDB
|
||||
deployment, you will only ever have one _DB-Server_ on each k8s node and
|
||||
therefore the drain operation is relatively safe, even if the grace
|
||||
period is chosen too small.
|
||||
|
||||
Furthermore, we recommend to have one k8s node more than _DB-Servers_ in
|
||||
you cluster, such that the deployment of a replacement _DB-Server_ can
|
||||
happen quickly and not only after the maintenance work on the drained
|
||||
node has been completed. However, with the necessary care described
|
||||
below, the procedure should also work without this.
|
||||
|
||||
Finally, one should **not run a rolling upgrade or restart operation**
|
||||
at the time of a node drain.
|
||||
|
||||
## Clean out a DB-Server manually
|
||||
|
||||
In this step we clean out a _DB-Server_ manually, **before issuing the
|
||||
`kubectl drain` command**. Previously, we have denoted this step as optional,
|
||||
but for safety reasons, we consider it mandatory now, since it is near
|
||||
impossible to choose the grace period long enough in a reliable way.
|
||||
|
||||
Furthermore, if this step is not performed, we must choose
|
||||
the grace period long enough to avoid any risk, as explained in the
|
||||
previous section. However, this has a disadvantage which has nothing to
|
||||
do with ArangoDB: We have observed, that some k8s internal services like
|
||||
`fluentd` and some DNS services will always wait for the full grace
|
||||
period to finish a node drain. Therefore, the node drain operation will
|
||||
always take as long as the grace period. Since we have to choose this
|
||||
grace period long enough for ArangoDB to move all data on the _DB-Server_
|
||||
pod away to some other node, this can take a considerable amount of
|
||||
time, depending on the size of the data you keep in ArangoDB.
|
||||
|
||||
Therefore it is more time-efficient to perform the clean-out operation
|
||||
beforehand. One can observe completion and as soon as it is completed
|
||||
successfully, we can then issue the drain command with a relatively
|
||||
small grace period and still have a nearly risk-free procedure.
|
||||
|
||||
To clean out a _DB-Server_ manually, we have to use this API:
|
||||
|
||||
```
|
||||
POST /_admin/cluster/cleanOutServer
|
||||
```
|
||||
|
||||
… and send as body a JSON document like this:
|
||||
|
||||
```json
|
||||
{"server":"DBServer0006"}
|
||||
```
|
||||
|
||||
The value of the `"server"` attribute should be the name of the DB-Server
|
||||
which is the one in the pod which resides on the node that shall be
|
||||
drained next. This uses the UI short name (`ShortName` in the
|
||||
`/_admin/cluster/health` API), alternatively one can use the
|
||||
internal name, which corresponds to the pod name. In our example, the
|
||||
pod name is:
|
||||
|
||||
```
|
||||
my-arangodb-cluster-prmr-wbsq47rz-5676ed
|
||||
```
|
||||
|
||||
… where `my-arangodb-cluster` is the ArangoDB deployment name, therefore
|
||||
the internal name of the _DB-Server_ is `PRMR-wbsq47rz`. Note that `PRMR`
|
||||
must be all capitals since pod names are always all lower case. So, we
|
||||
could use the body:
|
||||
|
||||
```json
|
||||
{"server":"PRMR-wbsq47rz"}
|
||||
```
|
||||
|
||||
You can use this command line to achieve this:
|
||||
|
||||
```bash
|
||||
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/cleanOutServer --user root: -d '{"server":"PRMR-wbsq47rz"}'
|
||||
```
|
||||
|
||||
The API call will return immediately with a body like this:
|
||||
|
||||
```json
|
||||
{"error":false,"id":"38029195","code":202}
|
||||
```
|
||||
|
||||
The given `id` in this response can be used to query the outcome or
|
||||
completion status of the clean out server job with this API:
|
||||
|
||||
```
|
||||
GET /_admin/cluster/queryAgencyJob?id=38029195
|
||||
```
|
||||
|
||||
… which will return a body like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": false,
|
||||
"id": "38029195",
|
||||
"status": "Pending",
|
||||
"job": {
|
||||
"timeCreated": "2019-02-21T10:42:14.727Z",
|
||||
"server": "PRMR-wbsq47rz",
|
||||
"timeStarted": "2019-02-21T10:42:15Z",
|
||||
"type": "cleanOutServer",
|
||||
"creator": "CRDN-rxtu5pku",
|
||||
"jobId": "38029195"
|
||||
},
|
||||
"code": 200
|
||||
}
|
||||
```
|
||||
|
||||
Use this command line to check progress:
|
||||
|
||||
```bash
|
||||
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/queryAgencyJob?id=38029195 --user root:
|
||||
```
|
||||
|
||||
It indicates that the job is still ongoing (`"Pending"`). As soon as
|
||||
the job has completed, the answer will be:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": false,
|
||||
"id": "38029195",
|
||||
"status": "Finished",
|
||||
"job": {
|
||||
"timeCreated": "2019-02-21T10:42:14.727Z",
|
||||
"server": "PRMR-e6hbjob1",
|
||||
"jobId": "38029195",
|
||||
"timeStarted": "2019-02-21T10:42:15Z",
|
||||
"timeFinished": "2019-02-21T10:45:39Z",
|
||||
"type": "cleanOutServer",
|
||||
"creator": "CRDN-rxtu5pku"
|
||||
},
|
||||
"code": 200
|
||||
}
|
||||
```
|
||||
|
||||
From this moment on the _DB-Server_ can no longer be used to move
|
||||
shards to. At the same time, it will no longer hold any data of the
|
||||
cluster.
|
||||
|
||||
Now the drain operation involving a node with this pod on it is
|
||||
completely risk-free, even with a small grace period.
|
||||
|
||||
## Performing the drain
|
||||
|
||||
After all above [checks before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
|
||||
and the [manual clean out of the DB-Server](#clean-out-a-db-server-manually)
|
||||
have been done successfully, it is safe to perform the drain operation, similar to this command:
|
||||
|
||||
```bash
|
||||
kubectl drain gke-draintest-default-pool-394fe601-glts --delete-local-data --ignore-daemonsets --grace-period=300
|
||||
```
|
||||
|
||||
As described above, the options `--delete-local-data` for ArangoDB and
|
||||
`--ignore-daemonsets` for other services have been added. A `--grace-period` of
|
||||
300 seconds has been chosen because for this example we are confident that all the data on our _DB-Server_ pod
|
||||
can be moved to a different server within 5 minutes. Note that this is
|
||||
**not saying** that 300 seconds will always be enough. Regardless of how
|
||||
much data is stored in the pod, your mileage may vary, moving a terabyte
|
||||
of data can take considerably longer!
|
||||
|
||||
If the highly recommended step of
|
||||
[cleaning out a DB-Server manually](#clean-out-a-db-server-manually)
|
||||
has been performed beforehand, the grace period can easily be reduced to 60
|
||||
seconds - at least from the perspective of ArangoDB, since the server is already
|
||||
cleaned out, so it can be dropped readily and there is still no risk.
|
||||
|
||||
At the same time, this guarantees now that the drain is completed
|
||||
approximately within a minute.
|
||||
|
||||
## Things to check after a node drain
|
||||
|
||||
After a node has been drained, there will usually be one of the
|
||||
_DB-Servers_ gone from the cluster. As a replacement, another _DB-Server_ has
|
||||
been deployed on a different node, if there is a different node
|
||||
available. If not, the replacement can only be deployed when the
|
||||
maintenance work on the drained node has been completed and it is
|
||||
uncordoned again. In this latter case, one should wait until the node is
|
||||
back up and the replacement pod has been deployed there.
|
||||
|
||||
After that, one should perform the same checks as described in
|
||||
[things to check before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
|
||||
above.
|
||||
|
||||
Finally, it is likely that the shard distribution in the "new" cluster
|
||||
is not balanced out. In particular, the new _DB-Server_ is not automatically
|
||||
used to store shards. We recommend to
|
||||
[re-balance](https://docs.arangodb.com/stable/deploy/deployment/cluster/administration/#movingrebalancing-_shards_) the shard distribution,
|
||||
either manually by moving shards or by using the _Rebalance Shards_
|
||||
button in the _Shards_ tab under _NODES_ in the web interface. This redistribution can take
|
||||
some time again and progress can be monitored in the UI.
|
||||
|
||||
After all this has been done, **another round of checks should be done**
|
||||
before proceeding to drain the next node.
|
132
docs/driver-configuration.md
Normal file
132
docs/driver-configuration.md
Normal file
|
@ -0,0 +1,132 @@
|
|||
# Configuring your driver for ArangoDB access
|
||||
|
||||
In this chapter you'll learn how to configure a driver for accessing
|
||||
an ArangoDB deployment in Kubernetes.
|
||||
|
||||
The exact methods to configure a driver are specific to that driver.
|
||||
|
||||
## Database endpoint(s)
|
||||
|
||||
The endpoint(s) (or URLs) to communicate with is the most important
|
||||
parameter your need to configure in your driver.
|
||||
|
||||
Finding the right endpoints depend on whether your client application is running in
|
||||
the same Kubernetes cluster as the ArangoDB deployment or not.
|
||||
|
||||
### Client application in same Kubernetes cluster
|
||||
|
||||
If your client application is running in the same Kubernetes cluster as
|
||||
the ArangoDB deployment, you should configure your driver to use the
|
||||
following endpoint:
|
||||
|
||||
```
|
||||
https://<deployment-name>.<namespace>.svc:8529
|
||||
```
|
||||
|
||||
Only if your deployment has set `spec.tls.caSecretName` to `None`, should
|
||||
you use `http` instead of `https`.
|
||||
|
||||
### Client application outside Kubernetes cluster
|
||||
|
||||
If your client application is running outside the Kubernetes cluster in which
|
||||
the ArangoDB deployment is running, your driver endpoint depends on the
|
||||
external-access configuration of your ArangoDB deployment.
|
||||
|
||||
If the external-access of the ArangoDB deployment is of type `LoadBalancer`,
|
||||
then use the IP address of that `LoadBalancer` like this:
|
||||
|
||||
```
|
||||
https://<load-balancer-ip>:8529
|
||||
```
|
||||
|
||||
If the external-access of the ArangoDB deployment is of type `NodePort`,
|
||||
then use the IP address(es) of the `Nodes` of the Kubernetes cluster,
|
||||
combined with the `NodePort` that is used by the external-access service.
|
||||
|
||||
For example:
|
||||
|
||||
```
|
||||
https://<kubernetes-node-1-ip>:30123
|
||||
```
|
||||
|
||||
You can find the type of external-access by inspecting the external-access `Service`.
|
||||
To do so, run the following command:
|
||||
|
||||
```bash
|
||||
kubectl get service -n <namespace-of-deployment> <deployment-name>-ea
|
||||
```
|
||||
|
||||
The output looks like this:
|
||||
|
||||
```bash
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
|
||||
example-simple-cluster-ea LoadBalancer 10.106.175.38 192.168.10.208 8529:31890/TCP 1s app=arangodb,arango_deployment=example-simple-cluster,role=coordinator
|
||||
```
|
||||
|
||||
In this case the external-access is of type `LoadBalancer` with a load-balancer IP address
|
||||
of `192.168.10.208`.
|
||||
This results in an endpoint of `https://192.168.10.208:8529`.
|
||||
|
||||
## TLS settings
|
||||
|
||||
As mentioned before the ArangoDB deployment managed by the ArangoDB operator
|
||||
will use a secure (TLS) connection unless you set `spec.tls.caSecretName` to `None`
|
||||
in your `ArangoDeployment`.
|
||||
|
||||
When using a secure connection, you can choose to verify the server certificates
|
||||
provides by the ArangoDB servers or not.
|
||||
|
||||
If you want to verify these certificates, configure your driver with the CA certificate
|
||||
found in a Kubernetes `Secret` found in the same namespace as the `ArangoDeployment`.
|
||||
|
||||
The name of this `Secret` is stored in the `spec.tls.caSecretName` setting of
|
||||
the `ArangoDeployment`. If you don't set this setting explicitly, it will be
|
||||
set automatically.
|
||||
|
||||
Then fetch the CA secret using the following command (or use a Kubernetes client library to fetch it):
|
||||
|
||||
```bash
|
||||
kubectl get secret -n <namespace> <secret-name> --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt
|
||||
```
|
||||
|
||||
This results in a file called `ca.crt` containing a PEM encoded, x509 CA certificate.
|
||||
|
||||
## Query requests
|
||||
|
||||
For most client requests made by a driver, it does not matter if there is any
|
||||
kind of load-balancer between your client application and the ArangoDB
|
||||
deployment.
|
||||
|
||||
#### Note:
|
||||
Even a simple `Service` of type `ClusterIP` already behaves as a load-balancer.
|
||||
|
||||
The exception to this is cursor-related requests made to an ArangoDB `Cluster`
|
||||
deployment. The Coordinator that handles an initial query request (that results
|
||||
in a `Cursor`) will save some in-memory state in that Coordinator, if the result
|
||||
of the query is too big to be transfer back in the response of the initial
|
||||
request.
|
||||
|
||||
Follow-up requests have to be made to fetch the remaining data. These follow-up
|
||||
requests must be handled by the same Coordinator to which the initial request
|
||||
was made. As soon as there is a load-balancer between your client application
|
||||
and the ArangoDB cluster, it is uncertain which Coordinator will receive the
|
||||
follow-up request.
|
||||
|
||||
ArangoDB will transparently forward any mismatched requests to the correct
|
||||
Coordinator, so the requests can be answered correctly without any additional
|
||||
configuration. However, this incurs a small latency penalty due to the extra
|
||||
request across the internal network.
|
||||
|
||||
To prevent this uncertainty client-side, make sure to run your client
|
||||
application in the same Kubernetes cluster and synchronize your endpoints before
|
||||
making the initial query request. This will result in the use (by the driver) of
|
||||
internal DNS names of all Coordinators. A follow-up request can then be sent to
|
||||
exactly the same Coordinator.
|
||||
|
||||
If your client application is running outside the Kubernetes cluster the easiest
|
||||
way to work around it is by making sure that the query results are small enough
|
||||
to be returned by a single request. When that is not feasible, it is also
|
||||
possible to resolve this when the internal DNS names of your Kubernetes cluster
|
||||
are exposed to your client application and the resulting IP addresses are
|
||||
routable from your client application. To expose internal DNS names of your
|
||||
Kubernetes cluster, your can use [CoreDNS](https://coredns.io).
|
156
docs/helm.md
Normal file
156
docs/helm.md
Normal file
|
@ -0,0 +1,156 @@
|
|||
# Using the ArangoDB Kubernetes Operator with Helm
|
||||
|
||||
[`Helm`](https://www.helm.sh/) is a package manager for Kubernetes, which enables
|
||||
you to install various packages (include the ArangoDB Kubernetes Operator)
|
||||
into your Kubernetes cluster.
|
||||
|
||||
The benefit of `helm` (in the context of the ArangoDB Kubernetes Operator)
|
||||
is that it allows for a lot of flexibility in how you install the operator.
|
||||
For example you can install the operator in a namespace other than
|
||||
`default`.
|
||||
|
||||
## Charts
|
||||
|
||||
The ArangoDB Kubernetes Operator is contained in `helm` chart `kube-arangodb` which contains the operator for the
|
||||
`ArangoDeployment`, `ArangoLocalStorage` and `ArangoDeploymentReplication` resource types.
|
||||
|
||||
## Configurable values for ArangoDB Kubernetes Operator
|
||||
|
||||
The following values can be configured when installing the
|
||||
ArangoDB Kubernetes Operator with `helm`.
|
||||
|
||||
Values are passed to `helm` using an `--set=<key>=<value>` argument passed
|
||||
to the `helm install` or `helm upgrade` command.
|
||||
|
||||
### `operator.image`
|
||||
|
||||
Image used for the ArangoDB Operator.
|
||||
|
||||
Default: `arangodb/kube-arangodb:latest`
|
||||
|
||||
### `operator.imagePullPolicy`
|
||||
|
||||
Image pull policy for Operator images.
|
||||
|
||||
Default: `IfNotPresent`
|
||||
|
||||
### `operator.imagePullSecrets`
|
||||
|
||||
List of the Image Pull Secrets for Operator images.
|
||||
|
||||
Default: `[]string`
|
||||
|
||||
### `operator.service.type`
|
||||
|
||||
Type of the Operator service.
|
||||
|
||||
Default: `ClusterIP`
|
||||
|
||||
### `operator.annotations`
|
||||
|
||||
Annotations passed to the Operator Deployment definition.
|
||||
|
||||
Default: `[]string`
|
||||
|
||||
### `operator.resources.limits.cpu`
|
||||
|
||||
CPU limits for operator pods.
|
||||
|
||||
Default: `1`
|
||||
|
||||
### `operator.resources.limits.memory`
|
||||
|
||||
Memory limits for operator pods.
|
||||
|
||||
Default: `256Mi`
|
||||
|
||||
### `operator.resources.requested.cpu`
|
||||
|
||||
Requested CPI by Operator pods.
|
||||
|
||||
Default: `250m`
|
||||
|
||||
### `operator.resources.requested.memory`
|
||||
|
||||
Requested memory for operator pods.
|
||||
|
||||
Default: `256Mi`
|
||||
|
||||
### `operator.replicaCount`
|
||||
|
||||
Replication count for Operator deployment.
|
||||
|
||||
Default: `2`
|
||||
|
||||
### `operator.updateStrategy`
|
||||
|
||||
Update strategy for operator pod.
|
||||
|
||||
Default: `Recreate`
|
||||
|
||||
### `operator.features.deployment`
|
||||
|
||||
Define if ArangoDeployment Operator should be enabled.
|
||||
|
||||
Default: `true`
|
||||
|
||||
### `operator.features.deploymentReplications`
|
||||
|
||||
Define if ArangoDeploymentReplications Operator should be enabled.
|
||||
|
||||
Default: `true`
|
||||
|
||||
### `operator.features.storage`
|
||||
|
||||
Define if ArangoLocalStorage Operator should be enabled.
|
||||
|
||||
Default: `false`
|
||||
|
||||
### `operator.features.backup`
|
||||
|
||||
Define if ArangoBackup Operator should be enabled.
|
||||
|
||||
Default: `false`
|
||||
|
||||
### `operator.enableCRDManagement`
|
||||
|
||||
If true and operator has enough access permissions, it will try to install missing CRDs.
|
||||
|
||||
Default: `true`
|
||||
|
||||
### `rbac.enabled`
|
||||
|
||||
Define if RBAC should be enabled.
|
||||
|
||||
Default: `true`
|
||||
|
||||
## Alternate namespaces
|
||||
|
||||
The `kube-arangodb` chart supports deployment into a non-default namespace.
|
||||
|
||||
To install the `kube-arangodb` chart is a non-default namespace, use the `--namespace`
|
||||
argument like this.
|
||||
|
||||
```bash
|
||||
helm install --namespace=mynamespace kube-arangodb.tgz
|
||||
```
|
||||
|
||||
Note that since the operators claim exclusive access to a namespace, you can
|
||||
install the `kube-arangodb` chart in a namespace once.
|
||||
You can install the `kube-arangodb` chart in multiple namespaces. To do so, run:
|
||||
|
||||
```bash
|
||||
helm install --namespace=namespace1 kube-arangodb.tgz
|
||||
helm install --namespace=namespace2 kube-arangodb.tgz
|
||||
```
|
||||
|
||||
The `kube-arangodb-storage` chart is always installed in the `kube-system` namespace.
|
||||
|
||||
## Common problems
|
||||
|
||||
### Error: no available release name found
|
||||
|
||||
This error is given by `helm install ...` in some cases where it has
|
||||
insufficient permissions to install charts.
|
||||
|
||||
For various ways to work around this problem go to [this Stackoverflow article](https://stackoverflow.com/questions/43499971/helm-error-no-available-release-name-found).
|
|
@ -1,12 +1,12 @@
|
|||
## How-to...
|
||||
|
||||
- [How to set a license key](./set_license.md)
|
||||
- [Pass additional params to operator](additional_configuration.md)
|
||||
- [Change architecture / enable ARM support](arch_change.md)
|
||||
- [Configure timezone for cluster](configuring_tz.md)
|
||||
- [Collect debug data for support case](debugging.md)
|
||||
- [Configure logging](logging.md)
|
||||
- [Enable maintenance mode](maintenance.md)
|
||||
- [Start metrics collection and monitoring](metrics.md)
|
||||
- [Override detected total memory](override_detected_memory.md)
|
||||
- [Manually recover cluster if you still have volumes with data](recovery.md)
|
||||
- [How to rotate Pod](rotate-pod.md)
|
||||
|
|
17
docs/how-to/set_license.md
Normal file
17
docs/how-to/set_license.md
Normal file
|
@ -0,0 +1,17 @@
|
|||
# How to set a license key
|
||||
|
||||
After deploying the ArangoDB Kubernetes operator, use the command below to deploy your [license key](https://docs.arangodb.com/stable/operations/administration/license-management/)
|
||||
as a secret which is required for the Enterprise Edition starting with version 3.9:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic arango-license-key --from-literal=token-v2="<license-string>"
|
||||
```
|
||||
|
||||
|
||||
Then specify the newly created secret in the ArangoDeploymentSpec:
|
||||
```yaml
|
||||
spec:
|
||||
# [...]
|
||||
license:
|
||||
secretName: arango-license-key
|
||||
```
|
BIN
docs/images/HealthyCluster.png
Normal file
BIN
docs/images/HealthyCluster.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 114 KiB |
BIN
docs/images/ShardsInSync.png
Normal file
BIN
docs/images/ShardsInSync.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 84 KiB |
152
docs/metrics.md
Normal file
152
docs/metrics.md
Normal file
|
@ -0,0 +1,152 @@
|
|||
# Metrics collection
|
||||
|
||||
Operator provides metrics of its operations in a format supported by [Prometheus](https://prometheus.io/).
|
||||
|
||||
The metrics are exposed through HTTPS on port `8528` under path `/metrics`.
|
||||
|
||||
For a full list of available metrics, see [here](generated/metrics/README.md).
|
||||
|
||||
Check out examples directory [examples/metrics](https://github.com/arangodb/kube-arangodb/tree/master/examples/metrics)
|
||||
for `Services` and `ServiceMonitors` definitions you can use to integrate
|
||||
with Prometheus through the [Prometheus-Operator by CoreOS](https://github.com/coreos/prometheus-operator).
|
||||
|
||||
|
||||
#### Contents
|
||||
- [Integration with standard Prometheus installation (no TLS)](#Integration-with-standard-Prometheus-installation-no-TLS)
|
||||
- [Integration with standard Prometheus installation (TLS)](#Integration-with-standard-Prometheus-installation-TLS)
|
||||
- [Integration with Prometheus Operator](#Integration-with-Prometheus-Operator)
|
||||
- [Exposing ArangoDB metrics](#ArangoDB-metrics)
|
||||
|
||||
|
||||
## Integration with standard Prometheus installation (no TLS)
|
||||
|
||||
After creating operator deployment, you must configure Prometheus using a configuration file that instructs it
|
||||
about which targets to scrape.
|
||||
To do so, add a new scrape job to your prometheus.yaml config:
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'arangodb-operator'
|
||||
|
||||
scrape_interval: 10s # scrape every 10 seconds.
|
||||
|
||||
scheme: 'https'
|
||||
tls_config:
|
||||
insecure_skip_verify: true
|
||||
|
||||
static_configs:
|
||||
- targets:
|
||||
- "<operator-endpoint-ip>:8528"
|
||||
```
|
||||
|
||||
## Integration with standard Prometheus installation (TLS)
|
||||
|
||||
By default, the operator uses self-signed certificate for its server API.
|
||||
To use your own certificate, you need to create k8s secret containing certificate and provide secret name to operator.
|
||||
|
||||
Create k8s secret (in same namespace where the operator is running):
|
||||
```shell
|
||||
kubectl create secret tls my-own-certificate --cert ./cert.crt --key ./cert.key
|
||||
```
|
||||
Then edit the operator deployment definition (`kubectl edit deployments.apps`) to use your secret for its server API:
|
||||
```
|
||||
spec:
|
||||
# ...
|
||||
containers:
|
||||
# ...
|
||||
args:
|
||||
- --server.tls-secret-name=my-own-certificate
|
||||
# ...
|
||||
```
|
||||
Wait for operator pods to restart.
|
||||
|
||||
Now update Prometheus config to use your certificate for operator scrape job:
|
||||
```yaml
|
||||
tls_config:
|
||||
# if you are using self-signed certificate, just specify CA certificate:
|
||||
ca_file: /etc/prometheus/rootCA.crt
|
||||
|
||||
# otherwise, specify the generated client certificate and key:
|
||||
cert_file: /etc/prometheus/cert.crt
|
||||
key_file: /etc/prometheus/cert.key
|
||||
```
|
||||
|
||||
## Integration with Prometheus Operator
|
||||
|
||||
Assuming that you have [Prometheus Operator](https://prometheus-operator.dev/) installed in your cluster (`monitoring` namespace),
|
||||
and kube-arangodb installed in `default` namespace, you can easily configure the integration with ArangoDB operator.
|
||||
|
||||
The easiest way to do that is to create new a ServiceMonitor:
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: arango-deployment-operator
|
||||
namespace: monitoring
|
||||
labels:
|
||||
prometheus: kube-prometheus
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: kube-arangodb
|
||||
namespaceSelector:
|
||||
matchNames:
|
||||
- default
|
||||
endpoints:
|
||||
- port: server
|
||||
scheme: https
|
||||
tlsConfig:
|
||||
insecureSkipVerify: true
|
||||
```
|
||||
|
||||
You also can see the example of Grafana dashboard at `examples/metrics` folder of this repo.
|
||||
|
||||
|
||||
## ArangoDB metrics
|
||||
|
||||
The operator can run [sidecar containers](./design/exporter.md) for ArangoDB deployments of type `Cluster` which expose metrics in Prometheus format.
|
||||
Edit your `ArangoDeployment` resource, setting `spec.metrics.enabled` to true to enable ArangoDB metrics:
|
||||
```yaml
|
||||
spec:
|
||||
metrics:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
The operator will run a sidecar container for every cluster component.
|
||||
In addition to the sidecar containers the operator will deploy a `Service` to access the exporter ports (from within the k8s cluster),
|
||||
and a resource of type `ServiceMonitor`, provided the corresponding custom resource definition is deployed in the k8s cluster.
|
||||
If you are running Prometheus in the same k8s cluster with the Prometheus operator, this will be the case.
|
||||
The ServiceMonitor will have the following labels set:
|
||||
```yaml
|
||||
app: arangodb
|
||||
arango_deployment: YOUR_DEPLOYMENT_NAME
|
||||
context: metrics
|
||||
metrics: prometheus
|
||||
```
|
||||
This makes it possible to configure your Prometheus deployment to automatically start monitoring on the available Prometheus feeds.
|
||||
To this end, you must configure the `serviceMonitorSelector` in the specs of your Prometheus deployment to match these labels. For example:
|
||||
```yaml
|
||||
serviceMonitorSelector:
|
||||
matchLabels:
|
||||
metrics: prometheus
|
||||
```
|
||||
would automatically select all pods of all ArangoDB cluster deployments which have metrics enabled.
|
||||
|
||||
By default, the sidecar metrics exporters are using TLS for all connections. You can disable the TLS by specifying
|
||||
```yaml
|
||||
spec:
|
||||
metrics:
|
||||
enabled: true
|
||||
tls: false
|
||||
```
|
||||
|
||||
You can fine-tune the monitored metrics by specifying `ArangoDeployment` annotations. Example:
|
||||
```yaml
|
||||
spec:
|
||||
annotations:
|
||||
prometheus.io/scrape: 'true'
|
||||
prometheus.io/port: '9101'
|
||||
prometheus.io/scrape_interval: '5s'
|
||||
```
|
||||
|
||||
See the [Metrics HTTP API documentation](https://docs.arangodb.com/stable/develop/http/monitoring/#metrics)
|
||||
for the metrics exposed by ArangoDB deployments.
|
42
docs/scaling.md
Normal file
42
docs/scaling.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
# Scaling your ArangoDB deployment
|
||||
|
||||
The ArangoDB Kubernetes Operator supports up and down scaling of
|
||||
the number of DB-Servers & Coordinators.
|
||||
|
||||
The scale up or down, change the number of servers in the custom
|
||||
resource.
|
||||
|
||||
E.g. change `spec.dbservers.count` from `3` to `4`.
|
||||
|
||||
Then apply the updated resource using:
|
||||
|
||||
```bash
|
||||
kubectl apply -f yourCustomResourceFile.yaml
|
||||
```
|
||||
|
||||
Inspect the status of the custom resource to monitor the progress of the scaling operation.
|
||||
|
||||
**Note: It is not possible to change the number of Agency servers after creating a cluster**.
|
||||
Make sure to specify the desired number when creating CR first time.
|
||||
|
||||
|
||||
## Overview
|
||||
|
||||
### Scale-up
|
||||
|
||||
When increasing the `count`, operator will try to create missing pods.
|
||||
When scaling up, make sure that you have enough computational resources / nodes, otherwise pod will stuck in Pending state.
|
||||
|
||||
|
||||
### Scale-down
|
||||
|
||||
Scaling down is always done 1 server at a time.
|
||||
|
||||
Scale down is possible only when all other actions on ArangoDeployment are finished.
|
||||
|
||||
The internal process followed by the ArangoDB operator when scaling up is as follows:
|
||||
- It chooses a member to be evicted. First it will try to remove unhealthy members or fall-back to the member with highest deletion_priority.
|
||||
- Making an internal calls, it forces the server to resign leadership.
|
||||
In case of DB servers it means that all shard leaders will be switched to other servers.
|
||||
- Wait until server is cleaned out from cluster.
|
||||
- Pod finalized.
|
125
docs/services-and-load-balancer.md
Normal file
125
docs/services-and-load-balancer.md
Normal file
|
@ -0,0 +1,125 @@
|
|||
# Services & Load balancer
|
||||
|
||||
The ArangoDB Kubernetes Operator will create services that can be used to
|
||||
reach the ArangoDB servers from inside the Kubernetes cluster.
|
||||
|
||||
By default, the ArangoDB Kubernetes Operator will also create an additional
|
||||
service to reach the ArangoDB deployment from outside the Kubernetes cluster.
|
||||
|
||||
For exposing the ArangoDB deployment to the outside, there are 2 options:
|
||||
|
||||
- Using a `NodePort` service. This will expose the deployment on a specific port (above 30.000)
|
||||
on all nodes of the Kubernetes cluster.
|
||||
- Using a `LoadBalancer` service. This will expose the deployment on a load-balancer
|
||||
that is provisioned by the Kubernetes cluster.
|
||||
|
||||
The `LoadBalancer` option is the most convenient, but not all Kubernetes clusters
|
||||
are able to provision a load-balancer. Therefore we offer a third (and default) option: `Auto`.
|
||||
In this option, the ArangoDB Kubernetes Operator tries to create a `LoadBalancer`
|
||||
service. It then waits for up to a minute for the Kubernetes cluster to provision
|
||||
a load-balancer for it. If that has not happened after a minute, the service
|
||||
is replaced by a service of type `NodePort`.
|
||||
|
||||
To inspect the created service, run:
|
||||
|
||||
```bash
|
||||
kubectl get services <deployment-name>-ea
|
||||
```
|
||||
|
||||
To use the ArangoDB servers from outside the Kubernetes cluster
|
||||
you have to add another service as explained below.
|
||||
|
||||
## Services
|
||||
|
||||
If you do not want the ArangoDB Kubernetes Operator to create an external-access
|
||||
service for you, set `spec.externalAccess.Type` to `None`.
|
||||
|
||||
If you want to create external access services manually, follow the instructions below.
|
||||
|
||||
### Single server
|
||||
|
||||
For a single server deployment, the operator creates a single
|
||||
`Service` named `<deployment-name>`. This service has a normal cluster IP
|
||||
address.
|
||||
|
||||
### Full cluster
|
||||
|
||||
For a full cluster deployment, the operator creates two `Services`.
|
||||
|
||||
- `<deployment-name>-int` a headless `Service` intended to provide
|
||||
DNS names for all pods created by the operator.
|
||||
It selects all ArangoDB & ArangoSync servers in the cluster.
|
||||
|
||||
- `<deployment-name>` a normal `Service` that selects only the Coordinators
|
||||
of the cluster. This `Service` is configured with `ClientIP` session
|
||||
affinity. This is needed for cursor requests, since they are bound to
|
||||
a specific Coordinator.
|
||||
|
||||
When the Coordinators are asked to provide endpoints of the cluster
|
||||
(e.g. when calling `client.SynchronizeEndpoints()` in the go driver)
|
||||
the DNS names of the individual `Pods` will be returned
|
||||
(`<pod>.<deployment-name>-int.<namespace>.svc`)
|
||||
|
||||
### Full cluster with DC2DC
|
||||
|
||||
For a full cluster with datacenter replication deployment,
|
||||
the same `Services` are created as for a Full cluster, with the following
|
||||
additions:
|
||||
|
||||
- `<deployment-name>-sync` a normal `Service` that selects only the syncmasters
|
||||
of the cluster.
|
||||
|
||||
## Load balancer
|
||||
|
||||
If you want full control of the `Services` needed to access the ArangoDB deployment
|
||||
from outside your Kubernetes cluster, set `spec.externalAccess.type` of the `ArangoDeployment` to `None`
|
||||
and create a `Service` as specified below.
|
||||
|
||||
Create a `Service` of type `LoadBalancer` or `NodePort`, depending on your
|
||||
Kubernetes deployment.
|
||||
|
||||
This service should select:
|
||||
|
||||
- `arango_deployment: <deployment-name>`
|
||||
- `role: coordinator`
|
||||
|
||||
The following example yields a service of type `LoadBalancer` with a specific
|
||||
load balancer IP address.
|
||||
With this service, the ArangoDB cluster can now be reached on `https://1.2.3.4:8529`.
|
||||
|
||||
```yaml
|
||||
kind: Service
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: arangodb-cluster-exposed
|
||||
spec:
|
||||
selector:
|
||||
arango_deployment: arangodb-cluster
|
||||
role: coordinator
|
||||
type: LoadBalancer
|
||||
loadBalancerIP: 1.2.3.4
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8529
|
||||
targetPort: 8529
|
||||
```
|
||||
|
||||
The following example yields a service of type `NodePort` with the ArangoDB
|
||||
cluster exposed on port 30529 of all nodes of the Kubernetes cluster.
|
||||
|
||||
```yaml
|
||||
kind: Service
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: arangodb-cluster-exposed
|
||||
spec:
|
||||
selector:
|
||||
arango_deployment: arangodb-cluster
|
||||
role: coordinator
|
||||
type: NodePort
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8529
|
||||
targetPort: 8529
|
||||
nodePort: 30529
|
||||
```
|
63
docs/storage-resource.md
Normal file
63
docs/storage-resource.md
Normal file
|
@ -0,0 +1,63 @@
|
|||
# ArangoLocalStorage Custom Resource
|
||||
|
||||
The ArangoDB Storage Operator creates and maintains ArangoDB
|
||||
storage resources in a Kubernetes cluster, given a storage specification.
|
||||
This storage specification is a `CustomResource` following
|
||||
a `CustomResourceDefinition` created by the operator. It is not enabled by
|
||||
default in the operator.
|
||||
|
||||
Example minimal storage definition:
|
||||
|
||||
```yaml
|
||||
apiVersion: "storage.arangodb.com/v1alpha"
|
||||
kind: "ArangoLocalStorage"
|
||||
metadata:
|
||||
name: "example-arangodb-storage"
|
||||
spec:
|
||||
storageClass:
|
||||
name: my-local-ssd
|
||||
localPath:
|
||||
- /mnt/big-ssd-disk
|
||||
```
|
||||
|
||||
This definition results in:
|
||||
|
||||
- a `StorageClass` called `my-local-ssd`
|
||||
- the dynamic provisioning of PersistentVolume's with
|
||||
a local volume on a node where the local volume starts
|
||||
in a sub-directory of `/mnt/big-ssd-disk`.
|
||||
- the dynamic cleanup of PersistentVolume's (created by
|
||||
the operator) after one is released.
|
||||
|
||||
The provisioned volumes will have a capacity that matches
|
||||
the requested capacity of volume claims.
|
||||
|
||||
## Specification reference
|
||||
|
||||
Below you'll find all settings of the `ArangoLocalStorage` custom resource.
|
||||
|
||||
### `spec.storageClass.name: string`
|
||||
|
||||
This setting specifies the name of the storage class that
|
||||
created `PersistentVolume` will use.
|
||||
|
||||
If empty, this field defaults to the name of the `ArangoLocalStorage`
|
||||
object.
|
||||
|
||||
If a `StorageClass` with given name does not yet exist, it
|
||||
will be created.
|
||||
|
||||
### `spec.storageClass.isDefault: bool`
|
||||
|
||||
This setting specifies if the created `StorageClass` will
|
||||
be marked as default storage class. (default is `false`)
|
||||
|
||||
### `spec.localPath: stringList`
|
||||
|
||||
This setting specifies one of more local directories
|
||||
(on the nodes) used to create persistent volumes in.
|
||||
|
||||
### `spec.nodeSelector: nodeSelector`
|
||||
|
||||
This setting specifies which nodes the operator will
|
||||
provision persistent volumes on.
|
144
docs/storage.md
Normal file
144
docs/storage.md
Normal file
|
@ -0,0 +1,144 @@
|
|||
# Storage configuration
|
||||
|
||||
An ArangoDB cluster relies heavily on fast persistent storage.
|
||||
The ArangoDB Kubernetes Operator uses `PersistentVolumeClaims` to deliver
|
||||
the storage to Pods that need them.
|
||||
|
||||
## Requirements
|
||||
|
||||
To use `ArangoLocalStorage` resources, it has to be enabled in the operator
|
||||
(replace `<version>` with the
|
||||
[version of the operator](https://github.com/arangodb/kube-arangodb/releases)):
|
||||
|
||||
```bash
|
||||
helm upgrade --install kube-arangodb \
|
||||
https://github.com/arangodb/kube-arangodb/releases/download/<version>/kube-arangodb-<version>.tgz \
|
||||
--set operator.features.storage=true
|
||||
```
|
||||
|
||||
## Storage configuration
|
||||
|
||||
In the `ArangoDeployment` resource, one can specify the type of storage
|
||||
used by groups of servers using the `spec.<group>.volumeClaimTemplate`
|
||||
setting.
|
||||
|
||||
This is an example of a `Cluster` deployment that stores its Agent & DB-Server
|
||||
data on `PersistentVolumes` that use the `my-local-ssd` `StorageClass`
|
||||
|
||||
The amount of storage needed is configured using the
|
||||
`spec.<group>.resources.requests.storage` setting.
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "cluster-using-local-ssh"
|
||||
spec:
|
||||
mode: Cluster
|
||||
agents:
|
||||
volumeClaimTemplate:
|
||||
spec:
|
||||
storageClassName: my-local-ssd
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
volumeMode: Filesystem
|
||||
dbservers:
|
||||
volumeClaimTemplate:
|
||||
spec:
|
||||
storageClassName: my-local-ssd
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 80Gi
|
||||
volumeMode: Filesystem
|
||||
```
|
||||
|
||||
Note that configuring storage is done per group of servers.
|
||||
It is not possible to configure storage per individual
|
||||
server.
|
||||
|
||||
This is an example of a `Cluster` deployment that requests volumes of 80GB
|
||||
for every DB-Server, resulting in a total storage capacity of 240GB (with 3 DB-Servers).
|
||||
|
||||
## Local storage
|
||||
|
||||
For optimal performance, ArangoDB should be configured with locally attached
|
||||
SSD storage.
|
||||
|
||||
The easiest way to accomplish this is to deploy an
|
||||
[`ArangoLocalStorage` resource](storage-resource.md).
|
||||
The ArangoDB Storage Operator will use it to provide `PersistentVolumes` for you.
|
||||
|
||||
This is an example of an `ArangoLocalStorage` resource that will result in
|
||||
`PersistentVolumes` created on any node of the Kubernetes cluster
|
||||
under the directory `/mnt/big-ssd-disk`.
|
||||
|
||||
```yaml
|
||||
apiVersion: "storage.arangodb.com/v1alpha"
|
||||
kind: "ArangoLocalStorage"
|
||||
metadata:
|
||||
name: "example-arangodb-storage"
|
||||
spec:
|
||||
storageClass:
|
||||
name: my-local-ssd
|
||||
localPath:
|
||||
- /mnt/big-ssd-disk
|
||||
```
|
||||
|
||||
Note that using local storage required `VolumeScheduling` to be enabled in your
|
||||
Kubernetes cluster. ON Kubernetes 1.10 this is enabled by default, on version
|
||||
1.9 you have to enable it with a `--feature-gate` setting.
|
||||
|
||||
### Manually creating `PersistentVolumes`
|
||||
|
||||
The alternative is to create `PersistentVolumes` manually, for all servers that
|
||||
need persistent storage (single, Agents & DB-Servers).
|
||||
E.g. for a `Cluster` with 3 Agents and 5 DB-Servers, you must create 8 volumes.
|
||||
|
||||
Note that each volume must have a capacity that is equal to or higher than the
|
||||
capacity needed for each server.
|
||||
|
||||
To select the correct node, add a required node-affinity annotation as shown
|
||||
in the example below.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolume
|
||||
metadata:
|
||||
name: volume-agent-1
|
||||
spec:
|
||||
capacity:
|
||||
storage: 100Gi
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
persistentVolumeReclaimPolicy: Delete
|
||||
storageClassName: local-ssd
|
||||
local:
|
||||
path: /mnt/disks/ssd1
|
||||
nodeAffinity:
|
||||
required:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: kubernetes.io/hostname
|
||||
operator: In
|
||||
values:
|
||||
- "node-1
|
||||
```
|
||||
|
||||
For Kubernetes 1.9 and up, you should create a `StorageClass` which is configured
|
||||
to bind volumes on their first use as shown in the example below.
|
||||
This ensures that the Kubernetes scheduler takes all constraints on a `Pod`
|
||||
that into consideration before binding the volume to a claim.
|
||||
|
||||
```yaml
|
||||
kind: StorageClass
|
||||
apiVersion: storage.k8s.io/v1
|
||||
metadata:
|
||||
name: local-ssd
|
||||
provisioner: kubernetes.io/no-provisioner
|
||||
volumeBindingMode: WaitForFirstConsumer
|
||||
```
|
54
docs/tls.md
Normal file
54
docs/tls.md
Normal file
|
@ -0,0 +1,54 @@
|
|||
# Secure connections (TLS)
|
||||
|
||||
The ArangoDB Kubernetes Operator will by default create ArangoDB deployments
|
||||
that use secure TLS connections.
|
||||
|
||||
It uses a single CA certificate (stored in a Kubernetes secret) and
|
||||
one certificate per ArangoDB server (stored in a Kubernetes secret per server).
|
||||
|
||||
To disable TLS, set `spec.tls.caSecretName` to `None`.
|
||||
|
||||
## Install CA certificate
|
||||
|
||||
If the CA certificate is self-signed, it will not be trusted by browsers,
|
||||
until you install it in the local operating system or browser.
|
||||
This process differs per operating system.
|
||||
|
||||
To do so, you first have to fetch the CA certificate from its Kubernetes
|
||||
secret.
|
||||
|
||||
```bash
|
||||
kubectl get secret <deploy-name>-ca --template='{{index .data "ca.crt"}}' | base64 -D > ca.crt
|
||||
```
|
||||
|
||||
### Windows
|
||||
|
||||
To install a CA certificate in Windows, follow the
|
||||
[procedure described here](http://wiki.cacert.org/HowTo/InstallCAcertRoots).
|
||||
|
||||
### macOS
|
||||
|
||||
To install a CA certificate in macOS, run:
|
||||
|
||||
```bash
|
||||
sudo /usr/bin/security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ca.crt
|
||||
```
|
||||
|
||||
To uninstall a CA certificate in macOS, run:
|
||||
|
||||
```bash
|
||||
sudo /usr/bin/security remove-trusted-cert -d ca.crt
|
||||
```
|
||||
|
||||
### Linux
|
||||
|
||||
To install a CA certificate in Linux, on Ubuntu, run:
|
||||
|
||||
```bash
|
||||
sudo cp ca.crt /usr/local/share/ca-certificates/<some-name>.crt
|
||||
sudo update-ca-certificates
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
- [Authentication](authentication.md)
|
115
docs/troubleshooting.md
Normal file
115
docs/troubleshooting.md
Normal file
|
@ -0,0 +1,115 @@
|
|||
# Troubleshooting
|
||||
|
||||
While Kubernetes and the ArangoDB Kubernetes operator automatically
|
||||
resolve a lot of issues, there are always cases where human attention
|
||||
is needed.
|
||||
|
||||
This chapter gives your tips & tricks to help you troubleshoot deployments.
|
||||
|
||||
## Where to look
|
||||
|
||||
In Kubernetes all resources can be inspected using `kubectl` using either
|
||||
the `get` or `describe` command.
|
||||
|
||||
To get all details of the resource (both specification & status),
|
||||
run the following command:
|
||||
|
||||
```bash
|
||||
kubectl get <resource-type> <resource-name> -n <namespace> -o yaml
|
||||
```
|
||||
|
||||
For example, to get the entire specification and status
|
||||
of an `ArangoDeployment` resource named `my-arangodb` in the `default` namespace,
|
||||
run:
|
||||
|
||||
```bash
|
||||
kubectl get ArangoDeployment my-arango -n default -o yaml
|
||||
# or shorter
|
||||
kubectl get arango my-arango -o yaml
|
||||
```
|
||||
|
||||
Several types of resources (including all ArangoDB custom resources) support
|
||||
events. These events show what happened to the resource over time.
|
||||
|
||||
To show the events (and most important resource data) of a resource,
|
||||
run the following command:
|
||||
|
||||
```bash
|
||||
kubectl describe <resource-type> <resource-name> -n <namespace>
|
||||
```
|
||||
|
||||
## Getting logs
|
||||
|
||||
Another invaluable source of information is the log of containers being run
|
||||
in Kubernetes.
|
||||
These logs are accessible through the `Pods` that group these containers.
|
||||
|
||||
To fetch the logs of the default container running in a `Pod`, run:
|
||||
|
||||
```bash
|
||||
kubectl logs <pod-name> -n <namespace>
|
||||
# or with follow option to keep inspecting logs while they are written
|
||||
kubectl logs <pod-name> -n <namespace> -f
|
||||
```
|
||||
|
||||
To inspect the logs of a specific container in `Pod`, add `-c <container-name>`.
|
||||
You can find the names of the containers in the `Pod`, using `kubectl describe pod ...`.
|
||||
|
||||
|
||||
## What if
|
||||
|
||||
### The `Pods` of a deployment stay in `Pending` state
|
||||
|
||||
There are two common causes for this.
|
||||
|
||||
- The `Pods` cannot be scheduled because there are not enough nodes available.
|
||||
This is usually only the case with a `spec.environment` setting that has a value of `Production`.
|
||||
|
||||
Solution: Add more nodes.
|
||||
|
||||
- There are no `PersistentVolumes` available to be bound to the `PersistentVolumeClaims`
|
||||
created by the operator.
|
||||
|
||||
Solution:
|
||||
Use `kubectl get persistentvolumes` to inspect the available `PersistentVolumes`
|
||||
and if needed, use the [`ArangoLocalStorage` operator](storage-resource.md)
|
||||
to provision `PersistentVolumes`.
|
||||
|
||||
### When restarting a `Node`, the `Pods` scheduled on that node remain in `Terminating` state
|
||||
|
||||
When a `Node` no longer makes regular calls to the Kubernetes API server, it is
|
||||
marked as not available. Depending on specific settings in your `Pods`, Kubernetes
|
||||
will at some point decide to terminate the `Pod`. As long as the `Node` is not
|
||||
completely removed from the Kubernetes API server, Kubernetes tries to use
|
||||
the `Node` itself to terminate the `Pod`.
|
||||
|
||||
The `ArangoDeployment` operator recognizes this condition and tries to replace those
|
||||
`Pods` with `Pods` on different nodes. The exact behavior differs per type of server.
|
||||
|
||||
### What happens when a `Node` with local data is broken
|
||||
|
||||
When a `Node` with `PersistentVolumes` hosted on that `Node` is broken and
|
||||
cannot be repaired, the data in those `PersistentVolumes` is lost.
|
||||
|
||||
If an `ArangoDeployment` of type `Single` was using one of those `PersistentVolumes`
|
||||
the database is lost and must be restored from a backup.
|
||||
|
||||
If an `ArangoDeployment` of type `ActiveFailover` or `Cluster` was using one of
|
||||
those `PersistentVolumes`, it depends on the type of server that was using the volume.
|
||||
|
||||
- If an `Agent` was using the volume, it can be repaired as long as 2 other
|
||||
Agents are still healthy.
|
||||
- If a `DBServer` was using the volume, and the replication factor of all database
|
||||
collections is 2 or higher, and the remaining DB-Servers are still healthy,
|
||||
the cluster duplicates the remaining replicas to
|
||||
bring the number of replicas back to the original number.
|
||||
- If a `DBServer` was using the volume, and the replication factor of a database
|
||||
collection is 1 and happens to be stored on that DB-Server, the data is lost.
|
||||
- If a single server of an `ActiveFailover` deployment was using the volume, and the
|
||||
other single server is still healthy, the other single server becomes leader.
|
||||
After replacing the failed single server, the new follower synchronizes with
|
||||
the leader.
|
||||
|
||||
|
||||
### See also
|
||||
- [Collecting debug data](./how-to/debugging.md)
|
31
docs/upgrading.md
Normal file
31
docs/upgrading.md
Normal file
|
@ -0,0 +1,31 @@
|
|||
# Upgrading ArangoDB version
|
||||
|
||||
The ArangoDB Kubernetes Operator supports upgrading an ArangoDB from
|
||||
one version to the next.
|
||||
|
||||
**Warning!**
|
||||
It is highly recommended to take a backup of your data before upgrading ArangoDB
|
||||
using [arangodump](https://docs.arangodb.com/stable/components/tools/arangodump/) or [ArangoBackup CR](backup-resource.md).
|
||||
|
||||
## Upgrade an ArangoDB deployment
|
||||
|
||||
To upgrade a cluster, change the version by changing
|
||||
the `spec.image` setting and the apply the updated
|
||||
custom resource using:
|
||||
|
||||
```bash
|
||||
kubectl apply -f yourCustomResourceFile.yaml
|
||||
```
|
||||
|
||||
The ArangoDB operator will perform an sequential upgrade
|
||||
of all servers in your deployment. Only one server is upgraded
|
||||
at a time.
|
||||
|
||||
For patch level upgrades (e.g. 3.9.2 to 3.9.3) each server
|
||||
is stopped and restarted with the new version.
|
||||
|
||||
For minor level upgrades (e.g. 3.9.2 to 3.10.0) each server
|
||||
is stopped, then the new version is started with `--database.auto-upgrade`
|
||||
and once that is finish the new version is started with the normal arguments.
|
||||
|
||||
The process for major level upgrades depends on the specific version.
|
298
docs/using-the-operator.md
Normal file
298
docs/using-the-operator.md
Normal file
|
@ -0,0 +1,298 @@
|
|||
# Using the ArangoDB Kubernetes Operator
|
||||
|
||||
## Installation
|
||||
|
||||
The ArangoDB Kubernetes Operator needs to be installed in your Kubernetes
|
||||
cluster first. Make sure you have access to this cluster and the rights to
|
||||
deploy resources at cluster level.
|
||||
|
||||
The following cloud provider Kubernetes offerings are officially supported:
|
||||
|
||||
- Amazon Elastic Kubernetes Service (EKS)
|
||||
- Google Kubernetes Engine (GKE)
|
||||
- Microsoft Azure Kubernetes Service (AKS)
|
||||
|
||||
If you have `Helm` available, use it for the installation as it is the
|
||||
recommended installation method.
|
||||
|
||||
### Installation with Helm
|
||||
|
||||
To install the ArangoDB Kubernetes Operator with [`helm`](https://www.helm.sh/),
|
||||
run the following commands (replace `<version>` with the
|
||||
[version of the operator](https://github.com/arangodb/kube-arangodb/releases)
|
||||
that you want to install):
|
||||
|
||||
```bash
|
||||
export URLPREFIX=https://github.com/arangodb/kube-arangodb/releases/download/<version>
|
||||
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz
|
||||
```
|
||||
|
||||
This installs operators for the `ArangoDeployment` and `ArangoDeploymentReplication`
|
||||
resource types, which are used to deploy ArangoDB and ArangoDB Datacenter-to-Datacenter Replication respectively.
|
||||
|
||||
If you want to avoid the installation of the operator for the `ArangoDeploymentReplication`
|
||||
resource type, add `--set=DeploymentReplication.Create=false` to the `helm install`
|
||||
command.
|
||||
|
||||
To use `ArangoLocalStorage` resources, also run:
|
||||
|
||||
```bash
|
||||
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz --set "operator.features.storage=true"
|
||||
```
|
||||
|
||||
The default CPU architecture of the operator is `amd64` (x86-64). To enable ARM
|
||||
support (`arm64`) in the operator, overwrite the following setting:
|
||||
|
||||
```bash
|
||||
helm install --generate-name $URLPREFIX/kube-arangodb-<version>.tgz --set "operator.architectures={amd64,arm64}"
|
||||
```
|
||||
|
||||
Note that you need to set [`spec.architecture`](deployment-resource-reference.md#specarchitecture-string)
|
||||
in the deployment specification, too, in order to create a deployment that runs
|
||||
on ARM chips.
|
||||
|
||||
For more information on installing with `Helm` and how to customize an installation,
|
||||
see [Using the ArangoDB Kubernetes Operator with Helm](helm.md).
|
||||
|
||||
### Installation with Kubectl
|
||||
|
||||
To install the ArangoDB Kubernetes Operator without `Helm`,
|
||||
run (replace `<version>` with the version of the operator that you want to install):
|
||||
|
||||
```bash
|
||||
export URLPREFIX=https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests
|
||||
kubectl apply -f $URLPREFIX/arango-crd.yaml
|
||||
kubectl apply -f $URLPREFIX/arango-deployment.yaml
|
||||
```
|
||||
|
||||
To use `ArangoLocalStorage` resources to provision `PersistentVolumes` on local
|
||||
storage, also run:
|
||||
|
||||
```bash
|
||||
kubectl apply -f $URLPREFIX/arango-storage.yaml
|
||||
```
|
||||
|
||||
Use this when running on bare-metal or if there is no provisioner for fast
|
||||
storage in your Kubernetes cluster.
|
||||
|
||||
To use `ArangoDeploymentReplication` resources for ArangoDB
|
||||
Datacenter-to-Datacenter Replication, also run:
|
||||
|
||||
```bash
|
||||
kubectl apply -f $URLPREFIX/arango-deployment-replication.yaml
|
||||
```
|
||||
|
||||
See [ArangoDeploymentReplication Custom Resource](deployment-replication-resource-reference.md)
|
||||
for details and an example.
|
||||
|
||||
You can find the latest release of the ArangoDB Kubernetes Operator
|
||||
in the [kube-arangodb repository](https://github.com/arangodb/kube-arangodb/releases/latest).
|
||||
|
||||
## ArangoDB deployment creation
|
||||
|
||||
After deploying the latest ArangoDB Kubernetes operator, use the command below to deploy your [license key](https://docs.arangodb.com/stable/operations/administration/license-management/) as a secret which is required for the Enterprise Edition starting with version 3.9:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic arango-license-key --from-literal=token-v2="<license-string>"
|
||||
```
|
||||
|
||||
Once the operator is running, you can create your ArangoDB database deployment
|
||||
by creating a `ArangoDeployment` custom resource and deploying it into your
|
||||
Kubernetes cluster.
|
||||
|
||||
For example (all examples can be found in the [kube-arangodb repository](https://github.com/arangodb/kube-arangodb/tree/master/examples)):
|
||||
|
||||
```bash
|
||||
kubectl apply -f examples/simple-cluster.yaml
|
||||
```
|
||||
Additionally, you can specify the license key required for the Enterprise Edition starting with version 3.9 as seen below:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
# [...]
|
||||
image: arangodb/enterprise:3.9.1
|
||||
license:
|
||||
secretName: arango-license-key
|
||||
```
|
||||
|
||||
## Connecting to your database
|
||||
|
||||
Access to ArangoDB deployments from outside the Kubernetes cluster is provided
|
||||
using an external-access service. By default, this service is of type
|
||||
`LoadBalancer`. If this type of service is not supported by your Kubernetes
|
||||
cluster, it is replaced by a service of type `NodePort` after a minute.
|
||||
|
||||
To see the type of service that has been created, run (replace `<service-name>`
|
||||
with the `metadata.name` you set in the deployment configuration, e.g.
|
||||
`example-simple-cluster`):
|
||||
|
||||
```bash
|
||||
kubectl get service <service-name>-ea
|
||||
```
|
||||
|
||||
When the service is of the `LoadBalancer` type, use the IP address
|
||||
listed in the `EXTERNAL-IP` column with port 8529.
|
||||
When the service is of the `NodePort` type, use the IP address
|
||||
of any of the nodes of the cluster, combine with the high (>30000) port listed
|
||||
in the `PORT(S)` column.
|
||||
|
||||
Point your browser to `https://<ip>:<port>/` (note the `https` protocol).
|
||||
Your browser shows a warning about an unknown certificate. Accept the
|
||||
certificate for now. Then log in using the username `root` and an empty password.
|
||||
|
||||
## Deployment removal
|
||||
|
||||
To remove an existing ArangoDB deployment, delete the custom resource.
|
||||
The operator deletes all created resources.
|
||||
|
||||
For example:
|
||||
|
||||
```bash
|
||||
kubectl delete -f examples/simple-cluster.yaml
|
||||
```
|
||||
|
||||
**Note that this will also delete all data in your ArangoDB deployment!**
|
||||
|
||||
If you want to keep your data, make sure to create a backup before removing the deployment.
|
||||
|
||||
## Operator removal
|
||||
|
||||
To remove the entire ArangoDB Kubernetes Operator, remove all
|
||||
clusters first and then remove the operator by running:
|
||||
|
||||
```bash
|
||||
helm delete <release-name-of-kube-arangodb-chart>
|
||||
# If `ArangoLocalStorage` operator is installed
|
||||
helm delete <release-name-of-kube-arangodb-storage-chart>
|
||||
```
|
||||
|
||||
or when you used `kubectl` to install the operator, run:
|
||||
|
||||
```bash
|
||||
kubectl delete deployment arango-deployment-operator
|
||||
# If `ArangoLocalStorage` operator is installed
|
||||
kubectl delete deployment -n kube-system arango-storage-operator
|
||||
# If `ArangoDeploymentReplication` operator is installed
|
||||
kubectl delete deployment arango-deployment-replication-operator
|
||||
```
|
||||
|
||||
## Example deployment using `minikube`
|
||||
|
||||
If you want to get your feet wet with ArangoDB and Kubernetes, you can deploy
|
||||
your first ArangoDB instance with `minikube`, which lets you easily set up a
|
||||
local Kubernetes cluster.
|
||||
|
||||
Visit the [`minikube` website](https://minikube.sigs.k8s.io/)
|
||||
and follow the installation instructions and start the cluster with
|
||||
`minikube start`.
|
||||
|
||||
Next, go to <https://github.com/arangodb/kube-arangodb/releases>
|
||||
to find out the latest version of the ArangoDB Kubernetes Operator. Then run the
|
||||
following commands, with `<version>` replaced by the version you looked up:
|
||||
|
||||
```bash
|
||||
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-crd.yaml
|
||||
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-deployment.yaml
|
||||
minikube kubectl -- apply -f https://raw.githubusercontent.com/arangodb/kube-arangodb/<version>/manifests/arango-storage.yaml
|
||||
```
|
||||
|
||||
To deploy a single server, create a file called `single-server.yaml` with the
|
||||
following content:
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "single-server"
|
||||
spec:
|
||||
mode: Single
|
||||
```
|
||||
|
||||
Insert this resource in your Kubernetes cluster using:
|
||||
|
||||
```bash
|
||||
minikube kubectl -- apply -f single-server.yaml
|
||||
```
|
||||
|
||||
To deploy an ArangoDB cluster instead, create a file called `cluster.yaml` with
|
||||
the following content:
|
||||
|
||||
```yaml
|
||||
apiVersion: "database.arangodb.com/v1"
|
||||
kind: "ArangoDeployment"
|
||||
metadata:
|
||||
name: "cluster"
|
||||
spec:
|
||||
mode: Cluster
|
||||
```
|
||||
|
||||
The same commands used in the single server deployment can be used to inspect
|
||||
your cluster. Just use the correct deployment name (`cluster` instead of
|
||||
`single-server`).
|
||||
|
||||
The `ArangoDeployment` operator in `kube-arangodb` inspects the resource you
|
||||
just deployed and starts the process to run ArangoDB.
|
||||
|
||||
To inspect the current status of your deployment, run:
|
||||
|
||||
```bash
|
||||
minikube kubectl -- describe ArangoDeployment single-server
|
||||
# or shorter
|
||||
minikube kubectl -- describe arango single-server
|
||||
```
|
||||
|
||||
To inspect the pods created for this deployment, run:
|
||||
|
||||
```bash
|
||||
minikube kubectl -- get pods --selector=arango_deployment=single-server
|
||||
```
|
||||
|
||||
The result looks similar to this:
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
single-server-sngl-cjtdxrgl-fe06f0 1/1 Running 0 1m
|
||||
```
|
||||
|
||||
Once the pod reports that it is has a `Running` status and is ready,
|
||||
your ArangoDB instance is available.
|
||||
|
||||
To access ArangoDB, run:
|
||||
|
||||
```bash
|
||||
minikube service single-server-ea
|
||||
```
|
||||
|
||||
This creates a temporary tunnel for the `single-server-ea` service and opens
|
||||
your browser. You need change the URL to start with `https://`. By default,
|
||||
it is `http://`, but the deployment uses TLS encryption for the connection.
|
||||
For example, if the address is `http://127.0.0.1:59050`, you need to change it
|
||||
to `https://127.0.0.1:59050`.
|
||||
|
||||
Your browser warns about an unknown certificate. This is because a self-signed
|
||||
certificate is used. Continue anyway. The exact steps for this depend on your
|
||||
browser.
|
||||
|
||||
You should see the login screen of ArangoDB's web interface. Enter `root` as the
|
||||
username, leave the password field empty, and log in. Select the default
|
||||
`_system` database. You should see the dashboard and be able to interact with
|
||||
ArangoDB.
|
||||
|
||||
If you want to delete your single server ArangoDB database, just run:
|
||||
|
||||
```bash
|
||||
minikube kubectl -- delete ArangoDeployment single-server
|
||||
```
|
||||
|
||||
To shut down `minikube`, run:
|
||||
|
||||
```bash
|
||||
minikube stop
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
- [Driver configuration](driver-configuration.md)
|
||||
- [Scaling](scaling.md)
|
||||
- [Upgrading](upgrading.md)
|
||||
- [Using the ArangoDB Kubernetes Operator with Helm](helm.md)
|
Loading…
Reference in a new issue