Running in Production¶
Number of Replicas¶
Always run at least two replicas (three or more are recommended) of your application to survive cluster updates and autoscaling without downtime.
Web applications should always configure a
readinessProbe to make sure that the container only gets traffic after a successful startup:
containers: - name: mycontainer image: myimage readinessProbe: httpGet: # Path to probe; should be cheap, but representative of typical behavior path: /.well-known/health port: 8080 timeoutSeconds: 1
See Configuring Liveness and Readiness Probes for details.
Always configure resource requests for both CPU and memory. The Kubernetes scheduler and cluster autoscaler need this information in order to make the right decisions. Example:
containers: - name: mycontainer image: myimage resources: requests: cpu: 100m # 100 millicores memory: 200Mi # 200 MiB
You should configure a resource limit for memory if possible. The memory resource limit will get your container
OOMKilled when reaching the limit.
Set the JVM heap memory dynamically by using the
java-dynamic-memory-opts script from Zalando’s OpenJDK base image and setting
containers: - name: mycontainer image: myjvmdockerimage env: # set the maximum available memory as JVM would assume host/node capacity otherwise # this is evaluated by java-dynamic-memory-opts in the Zalando OpenJDK base image # see https://github.com/zalando/docker-openjdk - name: MEM_TOTAL_KB valueFrom: resourceFieldRef: resource: limits.memory divisor: 1Ki resources: requests: cpu: 100m memory: 2Gi limits: memory: 2Gi