kube-arangodb/docs/mlextension-resource.md

---
layout: page
parent: Custom resources overview
title: ArangoMLExtension
---

# ArangoMLExtension Custom Resource


#### Enterprise Edition only

[Full CustomResourceDefinition reference ->](./api/ArangoMLExtension.v1beta1.md)


You can spin up the [ArangoML](https://github.com/arangoml) engine on existing ArangoDeployment.
That will allow you to train ML models and use them for predictions based on data in your database.

This instruction covers only the steps to run ArangoML in Kubernetes cluster with already running ArangoDeployment.
If you don't have one yet, consider checking [kube-arangodb installation guide](./using-the-operator.md) and [ArangoDeployment CR description](./deployment-resource-reference.md).

### To start ArangoML in your cluster, follow next steps:

1) Enable ML operator. e.g. if you are using Helm package, add `--set "operator.features.ml=true"` option to the Helm command.

2) Create `ArangoMLStorage` CR. This resource provides access for ArangoML to object storage. Currently only S3 API-compatible storages are supported.
  In this example we will use [Minio](https://min.io/) object storage. Please install Minio and make sure the endpoint is available from inside the cluster running ArangoML.

  - Create Kubernetes Secret containing Minio credentials to access S3 API. The secret data should contain two fields: `accessKey` and `secretKey`.
  - Create Kubernetes Secret containing CA certificates to validate connection to endpoint if your Minio installation uses encrypted connection. The secret data should contain two fields: `ca.crt` and `ca.key` (both PEM-encoded).
  - Create ArangoMLStorage resource. Example:
  ```yaml
    apiVersion: ml.arangodb.com/v1beta1
    kind: ArangoMLStorage
    metadata:
      name: myarangoml-storage
    spec:
      backend:
        s3: # defines access to S3 API
          caSecret: # skip this field if you are not using HTTPS connection to minio
            name: ml-storage-s3-ca
          credentialsSecret:
            name: ml-storage-s3-creds
          allowInsecure: false # set to true if you want to skip certificate check 
          endpoint: https://minio.my-minio-tenant.svc.cluster.local
      bucketName: my-arangoml-bucket # bucket will be created if it does not exist
      mode: # defines how storage proxy is deployed to cluster. Currently only 'sidecar' mode is supported. 
        sidecar: {} # you can configure various parameters for sidecar container here. See full CRD reference for details.
  ```

3) Create `ArangoMLExtension` CR. The name of extension **must** be the same as the name of `ArangoDeployment` and it should be created in the same namespace. 
  Assuming you have ArangoDeployment with name `myarangodb`, create CR:
  ```yaml
    apiVersion: ml.arangodb.com/v1beta1
    kind: ArangoMLExtension
    metadata:
      name: myarangodb
    spec:
      storage:
        name: myarangoml-storage # name of the ArangoMLStorage created on the previous step
      deployment:
        # you can add here: tolerations, nodeSelector, nodeAffinity, scheduler and many other parameters. See full CRD reference for details.
        replicas: 1 # by default only one pod is running which contains containers for api. You can scale it up or down.
        image: <api-image>
        # you can configure various parameters for container running this component here. See full CRD reference for details.
      init: # configuration for Kubernetes Job running initial bootstrap of ArangoML for your cluster.
        image: <init-image>
        # you can add here: tolerations, nodeSelector, nodeAffinity, scheduler and many other parameters. See full CRD reference for details.
      jobsTemplates:
        prediction:
          cpu:
            image: <prediction-job-cpu image>
            # you can configure various parameters for pod and container running this component here. See full CRD reference for details.
          gpu:
            image: <prediction-job-gpu image>
            # you can configure various parameters for pod and container running this component here. See full CRD reference for details.
            resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.
              limits:
                nvidia.com/gpu: "1"
              requests:
                nvidia.com/gpu: "1"
        training:
          cpu:
            image: <training-cpu-image>
            # you can configure various parameters for pod and container running this component here. See full CRD reference for details.
          gpu:
            image: <training-gpu-image>
            # you can configure various parameters for pod and container running this component here. See full CRD reference for details.
            resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.
              limits:
                nvidia.com/gpu: "1"
              requests:
                nvidia.com/gpu: "1"
  ```

4) After creation of CR, please wait a few minutes for ArangoML initialization to complete. You can check the status for ArangoMLExtension to see current state. Wait for condition `Ready` to be `True`:
```shell
kubectl describe arangomlextension myarangodb
```
```
# ...
status:
  conditions:
    name: Ready
    value: True
```

5) ArangoML now is ready to use! Head to [ArangoML documentation](https://github.com/arangoml) for more details on usage.

**Please note** the ArangoML creates a new database in your ArangoDB cluster for storing meta-information about model training and predictions. Editing or removing this database can cause ArangoML to fail or operate in an unpredictable manner.
Move ML Extension example under "CRD overview" (#1588) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-02-13 11:27:17 +00:00			`---`
			`layout: page`
			`parent: Custom resources overview`
			`title: ArangoMLExtension`
			`---`

Add docs with examples how to deploy ArangoMLExtension (#1552) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-01-03 15:21:29 +00:00			`# ArangoMLExtension Custom Resource`


			`#### Enterprise Edition only`

Update Documentation for Release 2024-05-24 08:07:54 +00:00			`[Full CustomResourceDefinition reference ->](./api/ArangoMLExtension.v1beta1.md)`
Add docs with examples how to deploy ArangoMLExtension (#1552) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-01-03 15:21:29 +00:00

			`You can spin up the [ArangoML](https://github.com/arangoml) engine on existing ArangoDeployment.`
			`That will allow you to train ML models and use them for predictions based on data in your database.`

			`This instruction covers only the steps to run ArangoML in Kubernetes cluster with already running ArangoDeployment.`
			`If you don't have one yet, consider checking [kube-arangodb installation guide](./using-the-operator.md) and [ArangoDeployment CR description](./deployment-resource-reference.md).`

			`### To start ArangoML in your cluster, follow next steps:`

			1) Enable ML operator. e.g. if you are using Helm package, add `--set "operator.features.ml=true"` option to the Helm command.

			2) Create `ArangoMLStorage` CR. This resource provides access for ArangoML to object storage. Currently only S3 API-compatible storages are supported.
			`In this example we will use [Minio](https://min.io/) object storage. Please install Minio and make sure the endpoint is available from inside the cluster running ArangoML.`

			- Create Kubernetes Secret containing Minio credentials to access S3 API. The secret data should contain two fields: `accessKey` and `secretKey`.
			- Create Kubernetes Secret containing CA certificates to validate connection to endpoint if your Minio installation uses encrypted connection. The secret data should contain two fields: `ca.crt` and `ca.key` (both PEM-encoded).
			`- Create ArangoMLStorage resource. Example:`
			```yaml
Update Documentation for Release 2024-05-24 08:07:54 +00:00			`apiVersion: ml.arangodb.com/v1beta1`
Add docs with examples how to deploy ArangoMLExtension (#1552) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-01-03 15:21:29 +00:00			`kind: ArangoMLStorage`
			`metadata:`
			`name: myarangoml-storage`
			`spec:`
			`backend:`
			`s3: # defines access to S3 API`
			`caSecret: # skip this field if you are not using HTTPS connection to minio`
			`name: ml-storage-s3-ca`
			`credentialsSecret:`
			`name: ml-storage-s3-creds`
			`allowInsecure: false # set to true if you want to skip certificate check`
			`endpoint: https://minio.my-minio-tenant.svc.cluster.local`
			`bucketName: my-arangoml-bucket # bucket will be created if it does not exist`
			`mode: # defines how storage proxy is deployed to cluster. Currently only 'sidecar' mode is supported.`
			`sidecar: {} # you can configure various parameters for sidecar container here. See full CRD reference for details.`
			```

			3) Create `ArangoMLExtension` CR. The name of extension must be the same as the name of `ArangoDeployment` and it should be created in the same namespace.
			Assuming you have ArangoDeployment with name `myarangodb`, create CR:
			```yaml
Update Documentation for Release 2024-05-24 08:07:54 +00:00			`apiVersion: ml.arangodb.com/v1beta1`
Add docs with examples how to deploy ArangoMLExtension (#1552) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-01-03 15:21:29 +00:00			`kind: ArangoMLExtension`
			`metadata:`
			`name: myarangodb`
			`spec:`
			`storage:`
			`name: myarangoml-storage # name of the ArangoMLStorage created on the previous step`
			`deployment:`
			`# you can add here: tolerations, nodeSelector, nodeAffinity, scheduler and many other parameters. See full CRD reference for details.`
Update Documentation for Release 2024-05-24 08:07:54 +00:00			`replicas: 1 # by default only one pod is running which contains containers for api. You can scale it up or down.`
			`image: <api-image>`
			`# you can configure various parameters for container running this component here. See full CRD reference for details.`
Add docs with examples how to deploy ArangoMLExtension (#1552) Co-authored-by: Adam Janikowski <12255597+ajanikow@users.noreply.github.com> 2024-01-03 15:21:29 +00:00			`init: # configuration for Kubernetes Job running initial bootstrap of ArangoML for your cluster.`
			`image: <init-image>`
			`# you can add here: tolerations, nodeSelector, nodeAffinity, scheduler and many other parameters. See full CRD reference for details.`
			`jobsTemplates:`
			`prediction:`
			`cpu:`
			`image: <prediction-job-cpu image>`
			`# you can configure various parameters for pod and container running this component here. See full CRD reference for details.`
			`gpu:`
			`image: <prediction-job-gpu image>`
			`# you can configure various parameters for pod and container running this component here. See full CRD reference for details.`
			`resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.`
			`limits:`
			`nvidia.com/gpu: "1"`
			`requests:`
			`nvidia.com/gpu: "1"`
			`training:`
			`cpu:`
			`image: <training-cpu-image>`
			`# you can configure various parameters for pod and container running this component here. See full CRD reference for details.`
			`gpu:`
			`image: <training-gpu-image>`
			`# you can configure various parameters for pod and container running this component here. See full CRD reference for details.`
			`resources: # this ensures that pod will be scheduled on GPU-enabled node. Adjust for your environment if neccessary.`
			`limits:`
			`nvidia.com/gpu: "1"`
			`requests:`
			`nvidia.com/gpu: "1"`
			```

			4) After creation of CR, please wait a few minutes for ArangoML initialization to complete. You can check the status for ArangoMLExtension to see current state. Wait for condition `Ready` to be `True`:
			```shell
			`kubectl describe arangomlextension myarangodb`
			```
			```
			`# ...`
			`status:`
			`conditions:`
			`name: Ready`
			`value: True`
			```

			`5) ArangoML now is ready to use! Head to [ArangoML documentation](https://github.com/arangoml) for more details on usage.`

			`Please note the ArangoML creates a new database in your ArangoDB cluster for storing meta-information about model training and predictions. Editing or removing this database can cause ArangoML to fail or operate in an unpredictable manner.`