Skip to content
Community

logo

Kubeflow Spark Operator#

Chart Name Version App version
kubeflow-spark-operator 2.4.0 2.4.0

What is Spark Operator?#

The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses

Kubernetes custom resources for specifying, running, and surfacing status of Spark applications.

Overview#

For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the Architecture. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The Kubernetes Operator for Apache Spark currently supports the following list of features:

  • Supports Spark 2.3 and up.
  • Enables declarative application specification and management of applications through custom resources.
  • Automatically runs spark-submit on behalf of users for each SparkApplication eligible for submission.
  • Provides native cron support for running scheduled applications.
  • Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
  • Supports automatic application re-submission for updated SparkApplication objects with updated specification.
  • Supports automatic application restart with a configurable restart policy.
  • Supports automatic retries of failed submissions with optional linear back-off.
  • Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.

Prerequisites#

Deploy k0rdent v1.4.0: QuickStart

Install template to k0rdent#

helm upgrade --install kubeflow-spark-operator oci://ghcr.io/k0rdent/catalog/charts/kgst --set "chart=kubeflow-spark-operator:2.4.0" -n kcm-system

Verify service template#

kubectl get servicetemplates -A
# NAMESPACE    NAME                            VALID
# kcm-system   kubeflow-spark-operator-2-4-0   true

Deploy service template#

apiVersion: k0rdent.mirantis.com/v1beta1
kind: MultiClusterService
metadata:
  name: kubeflow-spark-operator
spec:
  clusterSelector:
    matchLabels:
      group: demo
  serviceSpec:
    services:
    - template: kubeflow-spark-operator-2-4-0
      name: kubeflow-spark-operator
      namespace: kubeflow-spark-operator
      values: |
        spark-operator:
          controller:
            uiIngress:
              enable: true