Spark on K8s

Google has adopted Kubernetes as cluster manager, so even our client wanted to test out Kubernetes for Spark Applications


Requirements

Setup local Kubernetes cluster to run Spark examples


Implementation

Read the Spark Kubernetes docs.

Make sure the Spark version > 2.4.5 for this to work seamlessly.

Handy Links

A MakeFile was put in place to download all needed binaries, prepare the docker image with respect to Spark and use the Spark image to run the example locally

cd /path/to/spark-streaming-playground/kubernetes/spark/

Install k8s tooling locally, start minikube, initialize helm and deploy a docker registry chart to your minikube:

make

If everything goes well, you should see a message like this: Registry successfully deployed in minikube. Make sure you add 192.168.99.100:30000 to your insecure registries before continuing. Check https://docs.docker.com/registry/insecure/ for more information on how to do it in your platform.

In simple words, you needs to add an entry as follows:

sudo vim /etc/docker/daemon.json # and add followning line
  {
    "insecure-registries" : ["192.168.99.100:30000"]
  } 

Restart the docker…

sudo systemctl daemon-reload
sudo systemctl restart docker
docker info

You should see following log:

    Insecure Registries:
     192.168.99.100:30000
     127.0.0.0/8

Push the spark images to our private docker registry

make docker-push

HINT: if you see “Get https://192.168.99.100:30000/v2/: http: server gave HTTP response to HTTPS client” go back and check whether you have it listed in your insecure registries

Once your images are pushed, let’s run a sample spark job (first on client mode):

$SPARK_HOME/bin/spark-submit \
    --master k8s://https://$(minikube ip):8443 \
    --deploy-mode client \
    --conf spark.kubernetes.container.image=$(./get_image_name.sh spark) \
    --class org.apache.spark.examples.SparkPi \
    $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.5.jar

Limitations / TODOs

  • Explore more on the Kubernetes driver options

  • Explore how to run the example on AWS

References

  • https://tech.olx.com/running-spark-on-kubernetes-a-fully-functional-example-and-why-it-makes-sense-for-olx-d56b6a61fcbe

  • https://github.com/olx-global/spark-on-k8s