Recently I’ve been working a lot with kubernetes at my job. I’ve written up a bunch of internal documentation based on my work, and because it’s not company-specific I decided I’m going to share it here for anyone trying to find answers on the internet.

Table of Contents:

what is k8s?
view nodes in a nodegroup

view pods on a node
restart deployment
insufficient cpu, insufficient memory errors
deployment is not ready
CrashLoopBackoff
pod didn’t trigger scale-up
how do I avoid accidentally running a kubectl command on the wrong cluster?

What is Kubernetes? How do I use it?

Kubernetes is an open source container orchestration engine for automating deployment, scaling, and management of containerized applications. Check out official docs here: https://kubernetes.io/docs/home

View Nodes in a Nodegroup

Confirm the nodegroup you’re checking actually exists (e.g. you’re looking for pods running under engtools-infra/my-app) by listing all the nodegroups in a certain namespace:

k get nodegroup -n engtools-infra

Get the nodes under that nodegroup:

k get nodes -o wide | grep my-app

View Pods on a Node

View pods under a node with the following command (replace NODE_NAME_GOES_HERE with the node’s name, e.g. ip-10-10-12-123.ec2.internal).

k get pod --field-selector=spec.nodeName=NODE_NAME_GOES_HERE -owide --all-namespaces | grep -v -E "( kube-proxy-| kube2iam-|  local-volume-provisioner-| node-monitoring-| localusers-)"

Restart Deployment

Make sure you’re in the right cluster, get the deployment name, then use the rollout command.

k config current-context
k get deployment -n [NAMESPACE]
k rollout restart deployment [DEPLOYMENT_NAME] -n [NAMESPACE]

insufficient cpu, insufficient memory errors

If you’re seeing errors about insufficient resources, or node(s) had taint … that the pod didn’t tolerate warnings, you probably need to define instanceType and resources attributes in your service k8s objects / charts. For example, something like the following:

  instanceType: c5.2xlarge
  resources:
    requests:
      cpu: "3"
      memory: "4G"
    limits:
      cpu: "4"
      memory: "6G"

Deployment is not ready

A log you might see when deploying with Bazel is Deployment is not ready: <service>. 0 out of 2 expected pods are ready. This usually means k8s pods are in a CLB or some other non-ready status.

Use kubectl commands to check on the pods’ status and logs. Usual status check command:

kubectl -n NAMESPACE get pods

CrashLoopBackoff

If you see CrashLoopBackOff status on one of your pods, that means it’s crashing (I know, thanks captain obvious).

Check the pod logs: k logs CRASHING_POD -n NAMESPACE.

Find the deployment.yaml file describing the launch instructions for this pod. Find the command attribute in that file, comment out the value underneath it, and add the following:

          - tail
          - "-f"
          - /dev/null

Then re-deploy the pod. This will allow you to SSH into the pod, run the failing command manually, and generally get more info about the problem.

You might also need to comment out the liveness/readiness container specs in deployment.yaml, as those will often cause the pod to crash as well if there’s an issue with the primary container.

More info about troubleshooting crashloopbackoff pods can be found in this releasehub.com article.

pod didn’t trigger scale-up

Normal NotTriggerScaleUp 19m cluster-autoscaler pod didn’t trigger scale-up: 3 node(s) had taint {node: database-rev-eng-8c38-pool1}, that the pod didn’t tolerate…

Potentially a node group or a scaling group issue. For example, you might have a single AutoScalingGroup (asg) under your nodegroup-controller which has reached its max size. In this case, adding autoscaling: true option to your charts may be required in order to have more ASGs created.

If adding nodegroup options, you may need to delete the existing nodegroups (krm ng NODEGROUP_NAME -n NAMESPACE) for the option to take effect. After creating the new nodegroup, the cluster autoscaler may need to pass by, which can take a few minutes. So if you’re still seeing the scale-up messages and your pod in a Pending status after such a change you may need to give it a little time.

How do I avoid accidentally running a kubectl command on the wrong cluster?

Use Explicit Context in Commands

kubectl –context cluster3.us1.test.com get pods -n mynamespace

Kube-ps1

kube-ps1 will show you what cluster and namespace you’ve set in your prompt, which helps avoid this issue:

After installing it with homebrew, add this to your .rc file and restart your terminal to enable it:

source /usr/local/opt/kube-ps1/share/kube-ps1.sh
PROMPT='$(kube_ps1)'$PROMPT

It works great in conjunction with using kubectx and kubens to change your context/cluster/namespace.

It’s very easy to change contexts in a different terminal and have a stale prompt showing the old context. Press enter to refresh the prompt.

Powerlevel10k

If you use powerlevel10k, it can dynamically update your prompt with the current context when you’re typing a kubectl command – https://github.com/romkatv/powerlevel10k#show-on-command.

Troubleshooting

prompt_context:2: command not found: prompt_segment
- For existing setups, this error may pop-up. Follow this comment to fix: https://github.com/Powerlevel9k/powerlevel9k/issues/423#issuecomment-284427938

Kubernetes FAQ

What is Kubernetes? How do I use it?

View Nodes in a Nodegroup

View Pods on a Node

Restart Deployment

insufficient cpu, insufficient memory errors

Deployment is not ready

CrashLoopBackoff

pod didn’t trigger scale-up

How do I avoid accidentally running a kubectl command on the wrong cluster?

Use Explicit Context in Commands

Kube-ps1

Powerlevel10k

Like this:

Related

Published by Roman

Leave a ReplyCancel reply

What is Kubernetes? How do I use it?

View Nodes in a Nodegroup

View Pods on a Node

Restart Deployment

insufficient cpu, insufficient memory errors

Deployment is not ready

CrashLoopBackoff

pod didn’t trigger scale-up

How do I avoid accidentally running a kubectl command on the wrong cluster?

Use Explicit Context in Commands

Kube-ps1

Powerlevel10k

Share this:

Like this:

Related

Published by Roman

Leave a ReplyCancel reply