You can spin up the [ArangoML](https://github.com/arangoml) engine on existing ArangoDeployment.
That will allow you to train ML models and use them for predictions based on data in your database.
This instruction covers only the steps to run ArangoML in Kubernetes cluster with already running ArangoDeployment.
If you don't have one yet, consider checking [kube-arangodb installation guide](./using-the-operator.md) and [ArangoDeployment CR description](./deployment-resource-reference.md).
### To start ArangoML in your cluster, follow next steps:
1) Enable ML operator. e.g. if you are using Helm package, add `--set "operator.features.ml=true"` option to the Helm command.
2) Create `ArangoMLStorage` CR. This resource provides access for ArangoML to object storage. Currently only S3 API-compatible storages are supported.
In this example we will use [Minio](https://min.io/) object storage. Please install Minio and make sure the endpoint is available from inside the cluster running ArangoML.
- Create Kubernetes Secret containing Minio credentials to access S3 API. The secret data should contain two fields: `accessKey` and `secretKey`.
- Create Kubernetes Secret containing CA certificates to validate connection to endpoint if your Minio installation uses encrypted connection. The secret data should contain two fields: `ca.crt` and `ca.key` (both PEM-encoded).
bucketName: my-arangoml-bucket # bucket will be created if it does not exist
mode: # defines how storage proxy is deployed to cluster. Currently only 'sidecar' mode is supported.
sidecar: {} # you can configure various parameters for sidecar container here. See full CRD reference for details.
```
3) Create `ArangoMLExtension` CR. The name of extension **must** be the same as the name of `ArangoDeployment` and it should be created in the same namespace.
Assuming you have ArangoDeployment with name `myarangodb`, create CR:
init: # configuration for Kubernetes Job running initial bootstrap of ArangoML for your cluster.
image: <init-image>
# you can add here: tolerations, nodeSelector, nodeAffinity, scheduler and many other parameters. See full CRD reference for details.
jobsTemplates:
prediction:
cpu:
image: <prediction-job-cpuimage>
# you can configure various parameters for pod and container running this component here. See full CRD reference for details.
gpu:
image: <prediction-job-gpuimage>
# you can configure various parameters for pod and container running this component here. See full CRD reference for details.
resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
training:
cpu:
image: <training-cpu-image>
# you can configure various parameters for pod and container running this component here. See full CRD reference for details.
gpu:
image: <training-gpu-image>
# you can configure various parameters for pod and container running this component here. See full CRD reference for details.
resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
```
4) After creation of CR, please wait a few minutes for ArangoML initialization to complete. You can check the status for ArangoMLExtension to see current state. Wait for condition `Ready` to be `True`:
```shell
kubectl describe arangomlextension myarangodb
```
```
# ...
status:
conditions:
name: Ready
value: True
```
5) ArangoML now is ready to use! Head to [ArangoML documentation](https://github.com/arangoml) for more details on usage.
**Please note** the ArangoML creates a new database in your ArangoDB cluster for storing meta-information about model training and predictions. Editing or removing this database can cause ArangoML to fail or operate in an unpredictable manner.