Recently I’ve been working a lot with kubernetes at my job. I’ve written up a bunch of internal documentation based on my work, and because it’s not company-specific I decided I’m going to share it here for anyone trying to find answers on the internet.
Table of Contents:
- what is k8s?
- view nodes in a nodegroup
- view pods on a node
- restart deployment
- insufficient cpu, insufficient memory errors
- deployment is not ready
- CrashLoopBackoff
- pod didn’t trigger scale-up
- how do I avoid accidentally running a kubectl command on the wrong cluster?
What is Kubernetes? How do I use it?
Check out kubernetes docs here: https://kubernetes.io/docs/home
View Nodes in a Nodegroup
Confirm the nodegroup you’re checking actually exists (e.g. you’re looking for pods running under engtools-infra/my-app) by listing all the nodegroups in a certain namespace:
k get nodegroup -n engtools-infra
Get the nodes under that nodegroup:
k get nodes -o wide | grep my-app
View Pods on a Node
View pods under a node with the following command (replace NODE_NAME_GOES_HERE with the node’s name, e.g. ip-10-10-12-123.ec2.internal).
k get pod --field-selector=spec.nodeName=NODE_NAME_GOES_HERE -owide --all-namespaces | grep -v -E "( kube-proxy-| kube2iam-| local-volume-provisioner-| node-monitoring-| localusers-)"
Restart Deployment
Make sure you’re in the right cluster, get the deployment name, then use the rollout command.
k config current-context k get deployment -n [NAMESPACE] k rollout restart deployment [DEPLOYMENT_NAME] -n [NAMESPACE]
insufficient cpu, insufficient memory errors
If you’re seeing errors about insufficient resources, or node(s) had taint … that the pod didn’t tolerate warnings, you probably need to define instanceType and resources attributes in your service k8s objects / charts. For example, something like the following:
instanceType: c5.2xlarge resources: requests: cpu: "3" memory: "4G" limits: cpu: "4" memory: "6G"
Deployment is not ready
A log you might see when deploying with Bazel is Deployment is not ready: <service>. 0 out of 2 expected pods are ready.
This usually means k8s pods are in a CLB or some other non-ready status.
Use kubectl commands to check on the pods’ status and logs. Usual status check command:
kubectl -n NAMESPACE get pods
CrashLoopBackoff
If you see CrashLoopBackOff status on one of your pods, that means it’s crashing (I know, thanks captain obvious).
Check the pod logs: k logs CRASHING_POD -n NAMESPACE.
Find the deployment.yaml file describing the launch instructions for this pod. Find the command attribute in that file, comment out the value underneath it, and add the following:
- tail - "-f" - /dev/null
Then re-deploy the pod. This will allow you to SSH into the pod, run the failing command manually, and generally get more info about the problem.
You might also need to comment out the liveness/readiness container specs in deployment.yaml, as those will often cause the pod to crash as well if there’s an issue with the primary container.
More info about troubleshooting crashloopbackoff pods can be found in this releasehub.com article.
pod didn’t trigger scale-up
Normal NotTriggerScaleUp 19m cluster-autoscaler pod didn’t trigger scale-up: 3 node(s) had taint {node: database-rev-eng-8c38-pool1}, that the pod didn’t tolerate…
Potentially a node group or a scaling group issue. For example, you might have a single AutoScalingGroup (asg) under your nodegroup-controller which has reached its max size. In this case, adding autoscaling: true
option to your charts may be required in order to have more ASGs created.
If adding nodegroup options, you may need to delete the existing nodegroups (krm ng NODEGROUP_NAME -n NAMESPACE) for the option to take effect. After creating the new nodegroup, the cluster autoscaler may need to pass by, which can take a few minutes. So if you’re still seeing the scale-up messages and your pod in a Pending status after such a change you may need to give it a little time.
How do I avoid accidentally running a kubectl command on the wrong cluster?
Use Explicit Context in Commands
kubectl –context cluster3.us1.test.com get pods -n mynamespace
Kube-ps1
kube-ps1 will show you what cluster and namespace you’ve set in your prompt, which helps avoid this issue:
After installing it with homebrew, add this to your .rc file and restart your terminal to enable it:
source /usr/local/opt/kube-ps1/share/kube-ps1.sh PROMPT='$(kube_ps1)'$PROMPT
It works great in conjunction with using kubectx and kubens to change your context/cluster/namespace.
It’s very easy to change contexts in a different terminal and have a stale prompt showing the old context. Press enter to refresh the prompt.
Powerlevel10k
If you use powerlevel10k, it can dynamically update your prompt with the current context when you’re typing a kubectl command – https://github.com/romkatv/powerlevel10k#show-on-command.
Troubleshooting
- prompt_context:2: command not found: prompt_segment
- For existing setups, this error may pop-up. Follow this comment to fix: https://github.com/Powerlevel9k/powerlevel9k/issues/423#issuecomment-284427938