Spark on K8s

Google has adopted Kubernetes as cluster manager, so even our client wanted to test out Kubernetes for Spark Applications


Setup local Kubernetes cluster to run Spark examples


Read the Spark Kubernetes docs.

Make sure the Spark version > 2.4.5 for this to work seamlessly.

Handy Links

A MakeFile was put in place to download all needed binaries, prepare the docker image with respect to Spark and use the Spark image to run the example locally

cd /path/to/spark-streaming-playground/kubernetes/spark/

Install k8s tooling locally, start minikube, initialize helm and deploy a docker registry chart to your minikube:


If everything goes well, you should see a message like this: Registry successfully deployed in minikube. Make sure you add to your insecure registries before continuing. Check for more information on how to do it in your platform.

In simple words, you needs to add an entry as follows:

sudo vim /etc/docker/daemon.json # and add followning line
    "insecure-registries" : [""]

Restart the docker…

sudo systemctl daemon-reload
sudo systemctl restart docker
docker info

You should see following log:

    Insecure Registries:

Push the spark images to our private docker registry

make docker-push

HINT: if you see “Get http: server gave HTTP response to HTTPS client” go back and check whether you have it listed in your insecure registries

Once your images are pushed, let’s run a sample spark job (first on client mode):

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://$(minikube ip):8443 \
    --deploy-mode client \
    --conf spark.kubernetes.container.image=$(./ spark) \
    --class org.apache.spark.examples.SparkPi \

Limitations / TODOs

  • Explore more on the Kubernetes driver options

  • Explore how to run the example on AWS


