AWS具有每个节点的Pod IP限制,pod处于ContainerCreating状态

众所周知,AWS对每个节点都具有pod IP限制,而kubernetes在调度时并不关心这一点,在无法分配pod IP的节点上调度pod,并且pod停留在ContainerCreating状态,如下所示:

Normal   Scheduled               114s                 default-scheduler                        Successfully assigned default/whoami-deployment-9f9c86c4f-r4flx to ip-192-168-15-248.ec2.internal
Warning  FailedCreatepodSandBox  111s                 kubelet,ip-192-168-15-248.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8d4b5f98f9b600ad9ec486f994fa2f9223d5224842df7f78802616f014b52970" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container
Normal   SandboxChanged          86s (x12 over 109s)  kubelet,ip-192-168-15-248.ec2.internal  pod sandbox changed,it will be killed and re-created.
Warning  FailedCreatepodSandBox  61s (x4 over 76s)    kubelet,ip-192-168-15-248.ec2.internal  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e2a3c54ba7d9a33a45248f7c276f4a2d5b0c8ba6c3deb5184392156b35638553" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container

因此,我尝试通过使用key = value:NoSchedule污染节点来解决该问题,以便默认调度程序不会将pod调度到已达到pod IP限制的节点,并删除所有停留在ContainerCreating状态的pod。我希望这将使调度程序不将更多的pod调度到受污染的节点,这就是发生的事情,但是,由于pod没有被调度,所以我也希望集群自动缩放器可以扩展ASG,而我的pod可以在新节点上运行,这就是没有发生什么。

当我描述pod时,我会看到:

Warning FailedScheduling 40s (x5 over 58s) default-scheduler 0/5 nodes are available: 5 node(s) had taints that the pod didn't tolerate.

Normal NotTriggerScaleUp 5s (x6 over 56s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

当我查看集群自动缩放器日志时,我看到:

I1108 16:30:47.521026 1 event.go:209] Event(v1.ObjectReference{Kind:"pod",Namespace:"default",Name:"whoami-deployment-9f9c86c4f-x5h4d",UID:"158cc806-0245-11ea-a67a-0efb4254edc4",APIVersion:"v1",ResourceVersion:"2483839",FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate

现在,我尝试了一种替代方法,通过删除上述NoSchedule污点并通过以下方式修补节点,将节点标记为不可调度:

kubectl patch nodes node1.internal -p '{"spec": {"unschedulable": true}}'

这是我在cluster-autoscaler中看到的日志:

I1109 10:47:50.894680       1 static_autoscaler.go:138] Starting main loop
W1109 10:47:50.894719       1 static_autoscaler.go:562] Cluster has no ready nodes.
I1109 10:47:50.901157       1 event.go:209] Event(v1.ObjectReference{Kind:"ConfigMap",Namespace:"kube-system",Name:"cluster-autoscaler-status",UID:"7c949105-0153-11ea-9a39-12e5fc698b6e",ResourceVersion:"2629645",FieldPath:""}): type: 'Warning' reason: 'ClusterUnhealthy' Cluster has no ready nodes.

因此,我克服这个问题的想法没有任何意义。我该如何克服?

Kubernetes版本:1.14 群集自动缩放器:1.14.6

让我知道你们是否需要更多细节。

dtj3516702 回答:AWS具有每个节点的Pod IP限制,pod处于ContainerCreating状态

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3135868.html

大家都在问